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tation is somewhat unconventional. Indeed I begin with a discussion of the basic rules of 
mathematical reasoning and of the notion of proof formalized in a natural deduction system 
“a la Prawitz”. The rest of the material is more or less traditional but I emphasize partial 
functions more than usual (after all, programs may not terminate for all input) and I provide 
a fairly complete account of the basic concepts of graph theory. 
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Preface 



The curriculum of most undergraduate programs in computer science includes a course un- 
titled Discrete Mathematics. These days, given that many students who graduate with a 
degree in computer science end up with jobs where mathematical skills seem basically of no 
use , 1 one may ask why these students should take such a course. And if they do, what are 
the most basic notions that they should learn? 

As to the first question, I strongly believe that all computer science students should take 
such a course and I will try justifying this assertion below. 

The main reason is that, based on my experience of more than twenty five years of 
teaching, I have found that the majority of the students find it very difficult to present an 
argument in a rigorous fashion. The notion of a proof is something very fuzzy for most 
students and even the need for the rigorous justification of a claim is not so clear to most of 
them. Yet, they will all write complex computer programs and it seems rather crucial that 
they should understand the basic issues of program correctness. It also seems rather crucial 
that they should possess some basic mathematical skills to analyse, even in a crude way, 
the complexity of the programs they will write. Don Knuth has argued these points more 
eloquently that I can in his beautiful book, Concrete Mathematics, and I will not elaborate 
on this anymore. 

On a scholarly level, I will argue that some basic mathematical knowledge should be part 
of the scientific culture of any computer science student and more broadly, of any engineering 
student. 

Now, if we believe that computer science students should have some basic mathematical 
knowledge, what should it be? 

There no simple answer. Indeed, students with an interest in algorithms and complexity 
will need some discrete mathematics such as combinatorics and graph theory but students 
interested in computer graphics or computer vision will need some geometry and some contin- 
uous mathematics. Students interested in data bases will need to know some mathematical 
logic and students interested in computer architecture will need yet a different brand of 
mathematics. So, what’s the common core? 

As I said earlier, most students have a very fuzzy idea of what a proof is. This is actually 
true of most people! The reason is simple: It is quite difficult to define precisely what a proof 

1 In fact, some people would even argue that such skills constitute a handicap! 
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is. To do this, one has to define precisely what are the “rules of mathematical reasoning” 
and this is a lot harder than it looks. Of course, defining and analyzing the notion of proof 
is a major goal of mathematical logic. 

Having attempted some twenty years ago to “demystify” logic for computer scientists 
and being an incorrigible optimist, I still believe that there is great value in attempting 
to teach people the basic principles of mathematical reasoning in a precise but not overly 
formal manner. In these notes, I define the notion of proof as a certain kind of tree whose 
inner nodes respect certain proof rules presented in the style of a natural deduction system 
“a la Prawitz”. Of course, this has been done before (for example, in van Dalen [42]) but 
our presentation has more of a “computer science” flavor which should make it more easily 
digestible by our intended audience. Using such a proof system, it is easy to describe very 
clearly what is a proof by contradiction and to introduce the subtle notion of “constructive 
proof”. We even question the “supremacy” of classical logic, making our students aware of 
the fact that there isn’t just one logic, but different systems of logic, which often comes as 
a shock to them. 

Having provided a firm foundation for the notion of proof, we proceed with a quick and 
informal review of the first seven axioms of Zermelo-Frankel set theory. Students are usually 
surprised to hear that axioms are needed to ensure such a thing as the existence of the 
union of two sets and I respond by stressing that one should always keep a healthy dose of 
skepticism in life! 

What next? Again, my experience has been that most students do not have a clear 
idea of what a function is, even less of a partial function. Yet, computer programs may 
not terminate for all input, so the notion of partial function is crucial. Thus, we define 
carefully relations, functions and partial functions and investigate some of their properties 
(being injective, surjective, bijective). 

One of the major stumbling blocks for students is the notion of proof by induction and its 
cousin, the definition of functions by recursion. We spend quite a bit of time clarifying these 
concepts and we give a proof of the validity of the induction principle from the fact that the 
natural numbers are well-ordered. We also discuss the pigeonhole principle and some basic 
facts about equinumerosity, without introducing cardinal numbers. 

We introduce some elementary concepts of combinatorics in terms of counting problems. 
We introduce the binomial and multinomial coefficients and study some of their properties 
and we conclude with the Inclusion-Exclusion Principle. 

Next, we introduce partial orders, well-founded sets and complete induction. This way, 
students become aware of the fact that the induction principle applies to sets with an ordering 
far more complex that the ordering on the natural numbers. As an application, we prove 
the unique prime factorization in Z and discuss GCD’s. 

Another extremely important concept is that of an equivalence relation and the related 
notion of a partition. 
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We have included some material on lattices. Tarski’s fixed point Theorem, distributive 
lattices, boolean algebras and Heyting algebras. These topics are somewhat more advanced 
and can be omitted from the “core”. 

The last topic that we consider crucial is graph theory. We give a fairly complete pre- 
sentation of the basic concepts of graph theory: directed and undirected graphs, paths, 
cycles, spanning trees, cocycles, cotrees, flows and tensions, Eulcrian and Hamiltonian cy- 
cles, matchings, coverings, and planar graphs. We also discuss the network flow problem and 
prove the Max-Flow Min-Cut Theorem in an original way due to M. Sakarovitch. 

These notes grew out of lectures I gave in 2005 while teaching CSE260. There is more 
material than can be covered in one semester and some choices have to made as to what to 
omit. Unfortunately, when I taught this course, I was unable to cover any graph theory. 1 
also did not cover lattices and boolean algebras. 

My unconventional approach of starting with logic may not work for everybody, as some 
individuals find such material too abstract. It is possible to skip the chapter on logic and 
proceed directly with sets functions, etc. I admit that I have raised the bar perhaps higher 
than the average compared to other books on discrete maths. However, my experience when 
teaching CSE260 was that 70% of the students enjoyed the logic material, as it reminded 
them of programming. I hope that these notes will inspire and will be useful to motivated 
students. 
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Chapter 1 

Mathematical Reasoning, Proof 
Principles and Logic 

1.1 Introduction 



Mathematicians write proof; most of us write proofs. This leads to the question: Which 
principles of reasoning do we use when we write proofs? 



The goal of this Chapter is to try answering this question. We do so by formalizing 
the basic rules of reasoning that we use, most of the time unconsciously, in a certain kind 
of formalism known as a natural deduction system. We give a (very) quick introduction to 
mathematical logic , with a very deliberate proof-theoretic bent, that is, neglecting almost 
completely all semantic notions, except at a very intuitive level. We still feel that this 
approach is fruitful because the mechanical and rules-of-the-game flavor of proof systems 
is much more easily grasped than semantic concepts. In this approach, we follow Peter 
Andrew’s motto [1]: 



“To truth through proof” . 



We present various natural deduction systems due to Prawitz and Gentzen (in more 
modern notation), both in their intuitionistic and classical version. The adoption of natural 
deduction systems as proof systems makes it easy to question the validity of some of the 
inference rules, such as the principle of proof by contradiction. In brief, we try to explain to 
our readers the difference between constructive and classical (i.e., not necessarily construc- 
tive) proofs. In this respect, we plant the seed that there is a deep relationship between 
constructive proofs and the notion of computation (the “Curry-Howard isomorphism” or 
“formulae-as-types principle”, see Section 1.7). 
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1.2 Inference Rules, Deductions, The Proof Systems 
A and A f 

In this section, we review some basic proof principles and attempt to clarify, at least infor- 
mally, what constitutes a mathematical proof. 

In order to define the notion of proof rigorously, we would have to define a formal language 
in which to express statements very precisely and we would have to set up a proof system 
in terms of axioms and proof rules (also called inference rules). We will not go into this; 
this would take too much time and besides, this belongs to a logic course, which is not what 
CSE260 is! Instead, we will content ourselves with an intuitive idea of what a statement is 
and focus on stating as precisely as possible the rules of logic that are used in constructing 
proofs. Readers who really want to see a thorough (and rigorous) introduction to logic are 
referred to Gallier [18] van Dalen [42] or Huth and Ryan [30], a nice text with a Computer 
Science flavor. A beautiful exposition of logic (from a proof-theoretic point of view) is also 
given in Troelstra and Schwichtenberg [41], but at a more advanced level. You should also 
be aware of CSE482, a very exciting course about logic and its applications in Computer 
Science. By the way, my book has been out of print for some time but you can get it free 
(as pdf hies) from my logic web site 

http: //www. cis.upenn.edu/~jean/gbooks/logic.html 

In mathematics, we prove statements. Statements may be atomic or compound, that 
is, built up from simpler statements using logical connectives , such as, implication (if-then), 
conjunction (and), disjunction (or), negation (not) and (existential or universal) quantifiers . 

As examples of atomic statements, we have: 

1. “a student is eager to learn”. 

2. “a students wants an A” . 

3. “an odd integer is never 0” 

4. “the product of two odd integers is odd” 

Atomic statements may also contain “variables” (standing for abitrary objects). For 
example 

1. human (x): u x is a human” 

2. needs-to-drink(x): “x” needs to drink 

An example of a compound statement is 



human (x) =y needs-to-drink(x) . 
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In the above statement, is the symbol used for logical implication. If we want to assert 
that every human needs to drink, we can write 

Vx(human(x) =>- needs-to-drink(x) ) ; 

This is read: “for every x, if x is a human then x needs to drink” . 

If we want to assert that some human needs to drink we write 

Elx(human(x) needs-to-drink(x) ) ; 

This is read: “for some x, if a; is a human then x needs to drink”. 

We often denote statements (also called propositions or (logical) formulae ) using letters, 
such as A, B , P, Q, etc., typically upper-case letters (but sometimes greek letters, ip, if, etc.). 

If P and Q are statements, then their conjunction is denoted P A Q (say: P and Q), 
their disjunction denoted PVQ (say: P or Q), their implication P =>■ Q or P D Q (say: if 
P then Q). Some authors use the symbol — > and write an implication as P — > Q. We do not 
like to use this notation because the symbol is already used in the notation for functions 
(/: A — > B). We will mostly use the symbol =K 

We also have the atomic statements _L (falsity), which corresponds to false (think of it 
as the statement which is false no matter what), and the atomic statement T (truth), which 
corresponds to true (think of it as the statement which is always true). The constant _L is 
also called falsum or absurdum. Then, it is convenient to define the negation of P as P 
and to abbreviate it as ->P (or sometimes ~ P). Thus, ~<P (say: not P) is just a shorthand 
for P =^_L. 

Whenever necessary to avoid ambiguities, we add matching parentheses: (PAQ), (PVQ), 
(P => Q). For example, PVQAP is ambigous; it means either (PV(QAP)) or ((PVQ) AR). 

Another important logical operator is equivalence. If P and Q are statements, then their 
equivalence , denoted P = Q (or P Q ), is an abbreviation for (P =>■ Q) A (Q =>- P). We 
often say “P if and only if Q" or even “P iff Q" for P = Q. As we will see shortly, to prove 
a logical equivalence, P = Q, we have to prove both implications P => Q and 0 P . 

An implication P Q should be understood as an if-then statement, that is, if P is 
true then Q is also true. So, the meaning of negation is that if ->P holds then P must be 
false. Otherwise, as -i P is really P =4>_L, if P were true, then _L would have to be true, but 
this is absurd. 

Of course, there are problems with the above paragraph. What does truth have to do 
with all this? What do we mean when we say “P is true” ? What is the relationship between 
truth and provability? 

These are actually deep (and tricky!) questions whose answers are not so obvious. One 
of the major roles of logic is to clarify the notion of truth and its relationship to provability. 
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We will avoid these fundamental issues by dealing exclusively with the notion of proof. So, 
the big question is: What is a proof? 

Typically, the statements that we prove depend on some set of hypotheses , also called 
premises (or assumptions) . As we shall see shortly, this amounts to proving implications of 
the form 

(Pi A P 2 A • • • A P n ) Q. 

However, there are certain advantages in defining the notion of proof (or deduction) of a 
proposition from a set of premises. Sets of premises are usually denoted using upper-case 
greek letters such as T or A. 

Roughly speaking, a deduction of a proposition Q from a set of premises T is a finite 
labeled tree whose root is labeled with Q (the conclusion), whose leaves are labeled with 
premises from T (possibly with multiple occurrences), and such that every interior node 
corresponds to a given set of proof rules (or inference rules). Certain simple deduction trees 
are declared as obvious proofs, also called axioms. 

There are many kinds of proofs systems: Hilbert-stylc systems, Natural- deduction sys- 
tems, Gentzen sequents systems, etc. We describe a so-called natural- deduction system 
invented by G. Gentzen in the early 1930’s (and thoroughly investigated by D. Prawitz in 
the mid 1960’s). The major advantage of this system is that it captures quite nicely the 
“natural” rules of reasoning that one uses when proving mathematical statements. This does 
not mean that it is easy to find proofs in such a system or that this system is indeed very 
intuitive! We begin with the inference rules for implication. 

In the definition below, the expression T, P stands for the union of T and P. So, P may 
already belong to T. A picture such as 



A 

P 

represents a deduction tree whose root is labeled with P and whose leaves are labeled with 
propositions from A (possibly with multiples occurrences). Some of the propositions in A 
may be tagged be variables. The list of untagged propositions in A is the list of premises of 
the deduction tree. For example, in the deduction tree below, 

P^Q P 

P S) P Q R Q 

S R 

S 

no leaf is tagged, so the premises form the set 

A = {P => (P => S), P, Q => P, P => Q}, 



with two occurrences of P, and the conclusion is S. 
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Certain inferences rules have the effect that some of the original premises may be dis- 
carded; the traditional jargon is that some premises may be discharged (or closed ). This 
this the case for the inference rule whose conclusion in an implication. When one or several 
occurrences of some proposition, P, are discharged by an inference rule, these occurrences 
(which label some leaves) are tagged with some new variable not already appearing in the 
deduction tree. If x is a new tag, the tagged occurrences of P are denoted P x and we indicate 
the fact that premises were discharged by that inference by writing x immediately to the 
right of the inference bar. For example, 



P X ,Q 

Q 

x 

P^Q 

is a deduction tree in which the premise P is discharged by the inference rule. This deduction 
tree only has Q as a premise, since P is discharged. 

What is the meaning of the horizontal bars? Actually, nothing really! Here, we are victims 
of an old habit in logic. Observe that there is always a single proposition immediately under 
a bar but there may be several propositions immediately above a bar. The intended meaning 
of the bar is that the proposition below it is obtained as the result of applying an inference 
rule to the propositions above it. For example, in 

Q =>• R Q 
R 

the proposition R is the result of applying the ^-elimination rule (see Definition 1.2.1 below) 
to the two premises Q =>■ R and Q. Thus, the use of the bar is just a convention used by 
logicians going back at least to the 1900’s. Removing the bar everywhere would not change 
anything to our trees, except perhaps reduce their readability! Since most logic books draw 
proof trees using bars to indicate inferences, we also use bars in depicting our proof trees. 

Since propositions do not arise from the vacuum but instead are built up from a set 
of atomic propositions using logical connectives (here, =$►), we assume the existence of an 
“official set of atomic propositions”, PS = {P 1; P 2 , P 3 , • • • }. So, for example, Pi =>■ P 2 
and Pi =>■ (P 2 => Pi) are propositions. Typically, we will use upper-case letters such as 
P , Q, R, S, A, B , C, etc., to denote arbitrary propositions formed using atoms from PS. 

Definition 1.2.1 The axioms and inference rules for implicational logic are: 

r ,p 

p 

The above is a concise way of denoting a tree whose leaves are labeled with P and the 
propositions in T, each of these proposition (including P) having possibly multiple occur- 
rences but at least one, and whose root is labeled with P. A more explicit form is 
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fci 




ki 




Pi 




where fci, . . . , k n > 0, n > 1 and ki > 1 for some i with 1 < i < n. This axiom says that we 
always have a deduction of P* from any set of premises including Pj. 

The =^- introduction rule : 



T, P x 

q r 

X 

P^Q 

This inference rule says that if there is a deduction of Q from the premises in T and from 
the premise P, then there is a deduction of P Q from T. Note that this inference rule has 
the additional effect of discharging some occurrences of the premise P. These occurrences 
are tagged with a new variable, x, and the tag x is also placed immediately to the right of 
the inference bar. This is a reminder that the deduction tree whose conclusion is P Q no 
longer has the occurrences of P labeled with x as premises. 

The elimination rule : 



T A 

P^Q P 

Q 

This rule is also known as modus ponens. 

In the above axioms and rules, T or A may be empty and P, Q denote arbitrary propo- 
sitions built up from the atoms in PS. A deduction tree is a tree whose interior nodes 
correspond to applications of the above inference rules. A proof tree is a deduction tree 
such that all its premises are discharged. The above proof system is denoted Mff (here, the 
subscript m stands for minimal , referring to the fact that this a bare-bone logical system). 

In words, the ^-introduction rule says that in order to prove an implication P Q 
from a set of premises T, we assume that P has already been proved, add P to the premises 
in T and then prove Q from T and P. Once this is done, the premise P is deleted. This 
rule formalizes the kind of reasoning that we all perform whenever we prove an implication 
statement. In that sense, it is a natural and familiar rule, except that we perhaps never 
stopped to think about what we are really doing. However, the business about discharging 
the premise P when we are through with our argument is a bit puzzling. Most people 
probably never carry out this “discharge step” consciously, but such a process does takes 
place implicitely. 

It might help to view the action of proving an implication P Q as the construction 
of a program that converts a proof of P into a proof of Q. Then, if we supply a proof of 
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P as input to this program (the proof of P =>■ Q), it will output a proof of Q. So, if we 
don’t give the right kind of input to this program, for example, a “wrong proof” of P, we 
should not expect that the program retun a proof of Q. However, this does not say that the 
program is incorrect; the program was designed to do the right thing only if it is given the 
right kind of input. From this functional point of view (also called, constructive), if we take 
the simplistic view that P and Q assume the truth values true and false, we should not be 
shocked that if we give as input the value false (for P), then the truth value of the whole 
implication P Q is true. The program P Q is designed to produce the output value 
true (for Q ) if it is given the input value true (for P). So, this program only goes wrong 
when, given the input true (for P), it returns the value false (for Q). In this erroneous 
case, P => Q should indeed receive the value false. However, in all other cases, the program 
works correctly, even if it is given the wrong input (false for P). 



1. Only the leaves of a deduction tree may be discharged. Interior nodes, including the 
root, are never discharged. 

2. Once a set of leaves labeled with some premise P marked with the label x has been 
discharged, none of these leaves can be discharged again. So, each label (say x ) can 
only be used once. This corresponds to the fact that some leaves of our deduction trees 
get “killed off” (discharged). 

3. A proof is deduction tree whose leaves are all discharged (T is empty). This corre- 
sponds to the philosophy that if a proposition has been proved, then the validity of 
the proof should not depend on any assumptions that are still active. We may think 
of a deduction tree as an unfinished proof tree. 

4. When constructing a proof tree, we have to be careful not to include (accidently) extra 
premises that end up not beeing discharged. If this happens, we probably made a 
mistake and the redundant premises should be deleted. On the other hand, if we have 
a proof tree, we can always add extra premises to the leaves and create a new proof 
tree from the previous one by discharging all the new premises. 

5. Beware, when we deduce that an implication P Q is provable, we do not prove 
that P and Q are provable; we only prove that if P is provable then 0 is provable. 

The ^-elimination rule formalizes the use of auxiliary lemmas, a mechanism that we use 
all the time in making mathematical proofs. Think of P ^ Q as a lemma that has already 
been established and belongs to some data base of (useful) lemmas. This lemma says if I can 
prove P then I can prove Q. Now, suppose that we manage to give a proof of P. It follows 
from the ^-elimination rule that Q is also provable. 

Observe that in an introduction rule, the conclusion contains the logical connective as- 
sociated with the rule, in this case, =>; this jutihes the terminology “introduction”. On the 
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other hand, in an elimination rule, the logical connective associated with the rule is gone 
(although it may still appear in Q). The other inference rules for A, V, etc., will follow this 
pattern of introduction and elimination. 

Examples of proof trees. 

(a) 



px 

~P 

X 

p^p 

So, P => P is provable; this is the least we should expect from our proof system! 

(b) 



(P qy P* 

(Q => R) y Q 

R 

X 

P^R 

y 

(Q => R) => (P => R) 

Z 

(P => Q) => ((Q =>• R) =>• (P =>• R)) 

In order to better appreciate the difference between a deduction tree and a proof tree, 
consider the following two examples: 

1. The tree below is a deduction tree, since two its leaves are labeled with the premises 
P => Q and Q =>■ R, that have not been discharged yet. So, this tree represents a deduction 
of P R from the set of premises T = {P =>■ Q, Q =>■ R} but it is not a proof tree since 
T Y 0- However, observe that the original premise, P, labeled x, has been discharged. 

P^Q P x 

Q =>• R Q 

R 

X 

P^R 

2. The next tree was obtained from the previous one by applying the ^-introduction 
rule which triggered the discharge of the premise Q R labeled y, which is no longer active. 
However, the premise P Q is still active (has not been discharged, yet), so the tree below 
is a deduction tree of (Q U) A (P A R) from the set of premises T = {P Q}. It is 
not yet a proof tree since T Y 0- 




1.2. INFERENCE RULES, DEDUCTIONS, THE PROOF SYSTEMS J\f^ AND MQ% 19 



P^Q P x 

(Q => R) y Q 

R 

X 

P => R 

v 

(Q^R)^(P^ R ) 

Finally, one more application of the ^-introduction rule will discharged the premise 
P Q, at last, yielding the proof tree in (b). 

(c) In the next example, the two occurrences of A labeled x are discharged simultaneously. 

(A=> (B=> C)) z A x ( A => B)y A x 
B=>C B 

C 

X 

A => C 

y 

(A=> B)=> (A=> C ) 

Z 

(A=>{B=> C )) =s> ((A => B)=>(A=> C )) 

(d) In contrast to Example (c), in the proof tree below the two occurrences of A are 
discharded separately. To this effect, they are labeled differently. 

(A=>(B=> C)) z A x ( A => BY A * 

B^C B 

C 

X 

A^C 

y 

(A=> B)=> (A => C) 

z 

[A => (B => C )) => (( A => B)^(A^ C )) 

t 

A^ [(A=>(B => C )) => ((A =s> B)=>(A=> C))) 

Remark: How do we End these proof trees? Well, we could try to enumerate all possible 
proof trees systematically and see if a proof of the desired conclusion turns up. Obviously, 
this is a very inefficient procedure and moreover, how do we know that all possible proof 
trees will be generated and how do we know that such a method will terminate after a finite 
number of steps (what if the proposition proposed as a conclusion of a proof is not provable)? 
This is a very difficult problem and, in general, it can be shown that there is no procedure 
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that will give an answer in all cases and terminate in a finite number of steps for all possible 
input propositions. We will come back to this point in Section 1.7. However, for the system 
, such a procedure exists, but it is not easy to prove that it terminates in all cases and 
in fact, it can take a very long time. 

What we did, and we strongly advise our readers to try it when they attempt to construct 
proof trees, is to construct the proof tree from the bottom-up, starting from the proposition 
labeling the root, rather than top-down, i.e., starting from the leaves. During this process, 
whenever we are trying to prove a proposition P Q, we use the ^-introduction rule 
backward, i.e., we add P to the set of active premises and we try to prove Q from this 
new set of premises. At some point, we get stuck with an atomic proposition, say Q. Call 
the resulting deduction T > bu ; note that Q is the only active (undischarged) premises of V bu 
and the node labeled Q immediately below it plays a special role; we will call it the special 
node of D bu . The trick is to now switch strategy and start building a proof tree top-down, 
starting from the leaves, using the ^-elimination rule. If everything works out well, we get a 
deduction with root Q, say T>td, and then we glue this deduction V t ,d to the deduction D bu in 
such a way that the root of V t( i is identified with the special node of V bu labeled Q. We also 
have to make sure that all the discharged premises are linked to the correct instance of the 
^-introduction rule that caused them to be discharged. One of the difficulties is that during 
the bottom-up process, we don’t know how many copies of a premise need to be discharged 
in a single step. We only find out how many copies of a premise need to be discharged during 
the top-down process. 

Here is an illustration of this method for our third example. At the end of the bottom-up 
process, we get the deduction tree D bu : 

(A^(B^ C)) z ( A => B)y A x C 

C 

X 

A^C 

y 

(A^> B) => (A=> C) 

£ 

(A => {B => C)) => (( A => B)^(A^ C )) 

At the end of the top-down process, we get the deduction tree V t( j-. 

A=> (B =» C) A A => B A 
B^C ' B 

C 

Finally, after glueing V t d on top of T> bu (which has the correct number of premises to be 
discharged), we get our proof tree: 
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(A=> (B=> C)) z A x ( A => BY A x 
B^C B 

C 

X 

A^C 

y 

(A=> B)=> (A=> C ) 

[A^(B^ C)) =s> ((A => B)^(A^ C )) 



Let us return to the functional interpretation of implication by giving an example. The 
proposition P =>■ ((P Q) =$■ Q) has the following proof: 

(P => QY pv 

Q 

X 

(P =>• Q) =>• Q 

p =>((P =>Q)=> Q ) 

Now, say P is the proposition R => R, which has the proof 

R z 

R 

Z 

R => R 

Using ^-elimination, we obtain a proof of ((R =>• R) ==> Q) Q from the proof of 
(R =>• R) =>• (((P =>• R) =>• Q) =>• Q) and the proof of R => R: 

((R =>R)=> QY (P => R) y 
Q 

X 

((R =>• R) =>• Q) =>• Q 

y 

(R, =>• R) =>• (((P =$■ R) =$■ Q ) Q ) 

((P =>• P) =>• Q) =>• Q 

Note that the above proof is redundant. A more direct proof can be obtained as follows: 
Undo the last ^-introduction in the proof of (P =>- R) => (((R => R) => Q) =>Q)'- 

((R=> R)=>QY R^R 
Q 



R z 

R 

R =>■ R 



({R =>• R) =>■ Q) =>■ Q 
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and then glue the proof of R R on top of the leaf R => R, obtaining the desired proof of 
((R => R) =>■ Q) Q: 



K 

R 

Z 

((R=> R)=>Q) X R^R 
Q 

X 

((R =>• R) =>• Q) =>• Q 

In general, one has to exercise care with the label variables. It may be necessary to re- 
name some of these variables to avoid clashes. What we have above is an example of proof 
substitution also called proof normalization. We will come back to this topic in Section 1.7. 

The process of discharging premises when constructing a deduction is admittedly a bit 
confusing. Part of the problem is that a deduction tree really represents the last of a sequence 
of stages (corresponding to the application of inference rules) during which the current set 
of “active” premises, that is, those premises that have not yet been discharged (closed, 
cancelled) evolves (in fact, shrinks). Some mechanism is needed to keep track of which 
premises are no longer active and this is what this business of labeling premises with variables 
achieves. Historically, this is the first mechanism that was invented. However, Gentzen (in 
the 1930’s) came up with an alternative solution which is mathematically easier to handle. 
Moreover, it turns out that this notation is also better suited to computer implementations, 
if one wishes to implement an automated theorem prover. 

The point is to keep a record of all undischarged assumptions at every stage of the 
deduction. Thus, a deduction is now a tree whose nodes are labeled with expressions of the 
form T — > P, called sequents, where P is a proposition, and T is a record of all undischarged 
assumptions at the stage of the deduction associated with this node. 

During the construction of a deduction tree, it is necessary to discharge packets of as- 
sumptions consisting of one or more occurrences of the same proposition. To this effect, it is 
convenient to tag packets of assumptions with labels, in order to discharge the propositions 
in these packets in a single step. We use variables for the labels, and a packet labeled with x 
consisting of occurrences of the proposition P is written as x : P. Thus, in a sequent T — > P, 
the expression T is any finite set of the form X \ : Pi , . . . , x m : P m , where the Xi are pairwise 
distinct (but the Pi need not be distinct). Given T = X\ : P\, . . . , x m : P m , the notation 
T, x: P is only well defined when x ^ x^ for all i, 1 < i < m, in which case it denotes the 
set x i : Pi , . . • , x m : P m , x : P. 

Using sequents, the axioms and rules of Definition 1.2.2 are now expressed as follows: 

Definition 1.2.2 The axioms and inference rules of the system AfGm ( implicational logic, 
Gentzen- sequent style (the Q in MG stands for Gentzen )) are listed below: 




1.3. ADDING A, V, _L; THE PROOF SYSTEMS A^’ A ’ V>± AND A/^ av 



23 



T,x: P —> Q 

t-^p^q 



( =>-intro ) 



r-^p^g r-^p 
r - Q 



(=^- elim ) 



In an application of the rule (=$> -intro), observe that in the lower sequent, the proposition 
P (labeled x) is deleted from the list of premises occurring on the left-hand side of the arrow 
in the upper sequent. We say that the proposition P which appears as a hypothesis of 
the deduction is discharged (or closed ). It is important to note that the ability to label 
packets consisting of occurrences of the same proposition with different labels is essential, in 
order to be able to have control over which groups of packets of assumptions are discharged 
simultaneously. Equivalently, we could avoid tagging packets of assumptions with variables 
if we assumed that in a sequent T — > C, the expression T, also called a context, is a multiset 
of propositions. 

Below we show a proof of the third example given above in our new system. Let 

r — x: A (P C), y : A =>■ B, z: A. 



r -> A =» (P =» C) r ^ A r -> A P r^A 
r -> b =» c r b 

x : A =$■ (P =>• C),y: A P , z: A — > C 

x : A (P =$* C),y: A P — > A C 

x: A => (P => C) -> (A => P) => (A =» C) 

^ (A => (B => C )) => ((A =s> B)^(A=> CO) 

In principle, it does not matter which of the two systems A or A f we use to construct 
deductions; it is a matter of taste. My experience is that I make fewer mistakes with the 
Gentzen-sequent style system A fGm- 

We now describe the inference rules dealing with the connectives A, V and _L. 



1.3 Adding A, V, A; The Proof Systems A/^ A,v,± and 

HGf' a,va 



Recall that ->P is an abbreviation for P 
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Definition 1.3.1 The axioms and inference rules for (propositional) classical logic are: 
Axioms: 



r ,p 

p 

The =$■- introduction rule : 

t,p x 

~Q~ 

x 

P^Q 

The =>- elimination rule : 

r a 

P^Q p 

Q 

The A- introduction rule : 

r a 

p Q 

PAQ 

The A- elimination rule: 

r r 

PAQ PAQ 

P Q 

The V -introduction rule: 

r r 

p Q 

py Q PyQ 

The V -elimination rule: 

r A ,P X A ,QV 
P V Q R R 

x,y 

R 

The T-elimination rule: 

r 

_L 

P 
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The proof -by- contradiction rule (also known as reductio ad absurdum rule, for short 
RAA): 



T, ->P X 



P 

Since ->P is an abbreviation for P =^_L, the -i- introduction rule is a special case of the 
^-introduction rule (with Q =_!_). However, it is worth stating it explicitly: 

The -i - introduction rule : 



T,P X 



-nP 

Similarly, the -i-elimination rule is a special case of ^-elimination applied to 
-i P (= P =>_L) and P: 

The -i - elimination rule: 



r a 

-n P P 

T 



In the above axioms and rules, T, A or A may be empty, P, Q, R denote arbitrary propo- 
sitions built up from the atoms in PS and all the premises labeled x are discharged. A 
deduction tree is a tree whose interior nodes correspond to applications of the above infer- 
ence rules. A proof tree is a deduction tree such that all its premises are discharged. The 
above proof system is denoted A /y ,,A ’ v,i (here, the subscript c stands for classical ). 

The system obtained by removing the proof- by-contradiction (RAA) rule is called (propo- 
sitional) intuitionistic logic and is denoted A . The system obtained by deleting both 
the T-elimination rule and the proof- by-contradiction rule is called (propositional) minimal 
logic and is denoted Af^ ,A,v,± . 

The version of A f(A’ A,s/,± in terms of Gentzen sequents is the following: 

Definition 1.3.2 The axioms and inference rules of the system A fQf’’ A,v ' ± (of propositional 
classical logic, Gentzen- sequent style ) are listed below: 

T, x: P — > P 

T,x: P^Q 
T^P^Q 



(=>-intro) 
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T^P^Q r 



r 



T^PAQ 

Y^P 

r -»• p 

Y^PVQ 

r^pvg 



Y^Q 

-»• p r -> 
i^pao 

(A -elim) 



(V -intro) 
T,x: P — > P 



T -> P 

r ->j_ 
r -> p 

T, x : -iP — >_L 

r -> p 

T, x: P — >_L 

r -»• -nP ^ 

r -> -nP r -»• p 
r ->_l 



p 



'-elim ) 



Q 



(A -intro) 

y —> p aq 
y->q 

t^q 

r^pvg 

r, 2/ : Q -> R 



(A -elim) 
(V -intro) 
(' M-elim ) 



(_ L-elim ) 

( by-contra ) 
-■-introduction) 
(-■-elimination) 



Since the rule (_L-e/im) is trivial (does nothing) when P =_L, from now on, we will assume 
that P Propositional minimal logic , denoted A /"^^’ A ’ V ’" L , is obtained by dropping the 
(_L-e/im) and ( by-contra ) rules. Propositional intuitionistic logic , denoted A/’£/“ >,a ’ v ’" L , is 
obtained by dropping the ( by-contra ) rule. 



When we say that a proposition, P, is provable from T, we mean that we can construct 
a proof tree whose conclusion is P and whose set of premises is T, in one of the systems 
or jV-g^Av-L. Therefore, when we use the word “provable” unqualified, we mean 
provable in classical logic. If P is provable from Y in one of the intuitionistic systems _A/) = ^’ a,v ’" L 
or NfQf’ Ay,± , then we say intuitionistically provable (and similarly, if P is provable from Y 
in one of the systems AAA or J\fQ^ ,A,v,± , then we say provable in minimal logic). When 
P is provable from T, most people write Y b P, or b Y — > P, sometimes with the name of 
the corresponding proof system tagged as a subscript on the sign b if necessary to avoid 
ambiguities. When Y is empty, we just say P is provable (provable in intuitionistic logic, 
etc.) and write b P. 

We treat logical equivalence as a derived connective, that is, we view P = Q as an 
abbreviation for (P =>• Q) A (Q P). In view of the inference rules for A, we see that to 
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prove a logical equivalence P = Q, we just have to prove both implications P Q and 

Q -> r. 

In view of the _L-elimination rule, the best way to interpret the provability of a negation, 
-i P, is as “P is not provable”. Indeed, if -iP and P were both provable, then _L would be 
provable. So, P should not be provable if ->P is. This is not the usual interpretation of 
negation in terms of truth values, but it turns out to be the most fruitful. Beware that if P 
is not provable, then -i P is not provable in general! There are plenty of propositions such 
that neither P nor —>P is provable (for instance, P, with P an atomic proposition). 

Let us now make some (much-needed) comments about the above inference rules. There 
is no need to repeat our comments regarding the =^-rules. 

The A-introduction rule says that in order to prove a conjunction P A Q from some 
premises T, all we have to do is to prove both that P is provable from T and that Q is 
provable from T. The A-elimination rule says that once we have proved P A Q from T, then 
P (and Q ) is also provable from T. This makes sense intuitively as P A Q is “stronger” than 
P and Q separately (P A Q is true iff both P and Q are true). 

The V-introduction rule says that if P (or Q) has been proved from T, then P V Q is 
also provable from T. Again, this makes sense intuitively as P V Q is “weaker” than P and 
Q. The V-elimination rule formalizes the proof-by-cases method. It is a more subtle rule. 
The idea is that if we know that in the case where P is already assumed to be provable and 
similarly in the case where Q is already assumed to be provable that we can prove R (also 
using premises in T), then if P V Q is also provable from T, as we have “covered both cases”, 
it should be possible to prove R from T only (i.e., the premises P and Q are discarded). 

The T-elimination rule formalizes the principle that once a false statement has been 
established, then anything should be provable. 

The proof-by-contradiction rule formalizes the method of proof by contradiction! That 
is, in order to prove that P can be deduced from some premises T, one may assume the 
negation, -> P, of P (intuitively, assume that P is false) and then derive a contradiction from 
T and -iP (i.e., derive falsity). Then, P actually follows from T without using -> P as a 
premise , i.e., -> P is discharged. 

Most people, I believe, will be comfortable with the rules of minimal logic and will agree 
that they constitute a “reasonable” formalization of the rules of reasoning involving =>, A 
and V. Indeed, these rules seem to express the intuitive meaning of the connectives =^, A 
and V. However, some may question the two rules T-elimination and proof- by-contradiction. 
Indeed, their meaning is not as clear and, certainly, the proof-by-contradiction rule introduces 
a form of indirect reasoning that is somewhat worrisome. 

The problem has to do with the meaning of disjunction and negation and more gener- 
ally, with the notion of constructivity in mathematics. In fact, in the early 1900’s, some 
mathematicians, especially L. Brouwer (1881-1966), questioned the validity of the proof-by- 
contradiction rule, among other principles. Two specific cases illustrate the problem, namely, 
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the propositions 

P V -i P and -i-i P P. 

As we will see shortly, the above propositions are both provable in classical logic. Now, 
Bronwer and some mathematicians belonging to his school of thoughts (the so-called “intu- 
itionsists” or “constructivists”) advocate that in order to prove a disjunction, P V Q (from 
some premises T) one has to either exhibit a proof of P or a proof or Q (from T). However, it 
can be shown that this fails for P V —>P. The fact that P V -iP is provable (in classical logic) 
does not imply that either P is provable or that -iP is provable! That P V —>P is provable is 
sometimes called the principle of the excluded middle ! In intuitionistic logic, P V ->P is not 
provable. Of course, if one gives up the proof-by-contradiction rule, then fewer propositions 
become provable. On the other hand, one may claim that the propositions that remain 
provable have more constructive proofs and thus, feels on safer grounds. 

A similar controversy arises with — i— iP P. If we give up the proof-by-contradiction 

rule, then this formula is no longer provable, i.e., — >— iP is no longer equivalent to P. Perhaps 
this relates to the fact that if one says 

“ I don’t have no money” 

then this does not mean that this person has money! (Similarly with “I don’t get no satis- 
faction”, ... ). However, note that one can still prove P — >— iP in minimal logic (try doing 
it!). Even stranger, — i— i— i P =>• -i P is provable in intuitionistic (and minimal) logic, so — > — i — i P 
and -i P are equivalent intuitionistically! 

Remark: Suppose we have a deduction 



r,-P 

_L 



as in the proof by contradiction rule. Then, by -i-introduction, we get a deduction of — i— iP 
from T: 



T,^P X 



P 

So, if we knew that — >— iP was equivalent to P (actually, if we knew that — >— iP =>■ P is 
provable) then the proof by contradiction rule would be justified as a valid rule (it follows 
from modus ponens). We can view the proof by contradiction rule as a sort of act of faith 
that consists in saying that if we can derive an inconsistency (i.e., chaos) by assuming the 
falsity of a statement P, then P has to hold in the first place. It not so clear that such an 
act of faith is justified and the intuitionists refuse to take it! 

Constructivity in mathematics is a fascinating subject but it is a topic that is really 
outside the scope of this course. What we hope is that our brief and very incomplete 
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discussion of constructivity issues made the reader aware that the rules of logic are not cast 
in stone and that, in particular, there isn’t only one logic. 

We feel safe in saying that most mathematicians work with classical logic and only few 
of them have reservations about using the proof-by-contradiction rule. Nevertheless, in- 
tuitionistic logic has its advantages, especially when it comes to proving the correctess of 
programs (a branch of computer science!). We will come back to this point several times in 
this course. 

In the rest of this section, we make further useful remarks about (classical) logic and give 
some explicit examples of proofs illustrating the inference rules of classical logic. We begin 
by proving that P V ->P is provable in classical logic. 

Proposition 1.3.3 The proposition P V ->P is provable in classical logic. 

Proof. We prove that P V (P =4>_l_) is provable by using the proof-by-contradiction rule as 
shown below: 



px 

((PV (P =*_L)) ^T) y PV(P^l) 



((P V (P =*-L))=*-L)» 



□ 



P =»_L 
P V (P =►_!_) 



P V (P =►_!_) 



y (by-contra) 



Next, we consider the equivalence of P and — > — >P. 

Proposition 1.3.4 The proposition P => — >P is provable in minimal logic. The proposition 
— >P => P is provable in classical logic. Therefore, in classical logic, P is equivalent to — >P . 

Proof . We leave that P — >— iP is provable in minimal logic as an exercise. Below is a proof 

of — i— iP P using the proof-by-contradiction rule: 

((P =►_!_) =>T) y (P =»_L) X 

Jl x (by-contra) 

P 

y 

((P =►_!_) =►_!_) =*► P 



□ 

The next proposition shows why _L can be viewed as the “ultimate” contradiction. 
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Proposition 1.3.5 In intuitionistic logic, the propositions _L and P A-iP are equivalent for 
all P. Thus, _L and P A ->P are also equivalent in classical propositional logic 

Proof. We need to show that both (P A -> P) and (P A ->P) are provable in 

intuitionistic logic. The provability of _L=^> (P A -> P) is an immediate consequence or _L- 
elimination, with T = 0. For (P A -i P) =^_L, we have the following proof: 

(P A —>P) X (P A —>P) X 
-nP P 

_L 

X 

(P A -.P) =4>_L □ 

So, in intuitionistic logic (and also in classical logic), _L is equivalent to P A ->P for all 
P. This means that _L is the “ultimate” contradiction, it corresponds to total inconsistency. 
By the way, we could have the bad luck that the system A (or Aff* ,A,v,± or even 
^ A ’ V ’ ± ) is inconsistent, that is, that _L is provable! Fortunately, this is not the case, 
although this hard to prove. (It is also the case that P V -iP and — >— iP P are not 

provable in intuitionistic logic, but this too is hard to prove!) 

1.4 Clearing Up Differences Between 

-^-introduction, ^-elimination and RAA 

The differences between the rules, -i-introduction, T-elimination and the proof by contradic- 
tion rule (RAA) are often unclear to the uninitiated reader and this tends to cause confusion. 
In this section, we will try to clear up some common misconceptions about these rules. 

Confusion 1. Why is RAA not a special case of -i-introduction? 

r, -iP x 

^ x (RAA) 

P 

The only apparent difference between -i-introduction (on the left) and RAA (on the right) is 
that in RAA, the premise P is negated but the conclusion is not, whereas in -i-introduction 
the premise P is not negated but the conslusion is. 

The important difference is that the conclusion of RAA is not negated. If we had applied 
-i-introduction instead of RAA on the right, we would have obtained 

r , px 

x (-i-intro) 

-.-.P 



r, p x 

x (-i-intro) 

-nP 
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where the conclusion would have been - 1 - 1 P as opposed to P. However, as we already said 
earlier, - 1 - 1 P P is not provable intuitionistically. Consequenly, RAA is not a special case 
of -i-introduction. 

Confusion 2. Is there any difference between _L-elimination and RAA? 

r r, -i p x 

Jl (_L-elim) Jl x (RAA) 

P P 

The difference is that T-elimination does not discharge any of its premises. In fact, RAA 
is a stronger rule which implies T-elimination as we now demonstate. 

RAA implies T-elimination. 

Suppose we have a deduction 



r 

T 



Then, for any proposition P, we can add the premise ->P to every leaf of the above deduction 
tree and we get the deduction tree 



T^P 

T 



We can now apply RAA to get the following deduction tree of P from T (since -P is 
discharged), and this is just the result of T-elimination: 

T, ~nP x 

- x (RAA) 

P 

The above considerations also show that RAA is obtained from T-elimination by adding 
the new rule of -i-i -elimination: 



r 



-.-.p 



p 



(-i-i-elimination) 



Some authors prefer adding the -i-i-elimination rule to intuitionistic logic instead of RAA 
in order to obtain classical logic. As we just demonstrated, the two additions are equivalent: 
by adding either RAA or -i-i-climination to intuitionistic logic, we get classical logic. 
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There is another way to obtain RAA from the rules of intuitionistic logic, this time, using 
the propositions of the form P V -iP. We saw in Proposition 1.3.3 that all formulae of the 
form P V -i P are provable in classical logic (using RAA). 

Confusion 3. Are propositions of the form P V ->P provable in intuitionistic logic? 

The answer is no, which may be disturbing to some readers. In fact, it is quite difficult 
to prove that propositions of the form P V ->P are not provable in intuitionistic logic. One 
method consists in using the fact that intuitionistic proofs can be normalized (see Section 1.7 
for more on normalization of proofs). Another method uses Kripke models (see van Dalen 
[42]). 

Part of the difficulty in understanding at some intuitive level why propositions of the 
form P V -i P are not provable in intuitionistic logic is that the notion of truth based on the 
truth values true and false is deeply rooted in all of us. In this frame of mind, it seems 
ridiculous to question the provability of P V ->P, since its truth value is true whether P is 
assigned the value true or false. Classical two-valued truth value semantics is too crude for 
intuitionistic logic. 

Another difficulty is that it is tempting to equate the notion of truth and the notion of 
provability. Unfortunately, because classical truth value semantics is too crude for intuition- 
istic logic, there are propositions that are universally true (i.e., they evaluate to true for 
all possible truth assignments of the atomic letters in them) and yet they are not provable 
intuitionistically. The propositions P V ->P and — >— iP P are such examples. 

One of the major motivations for advocating intuitionistic logic is that it yields proofs 
that are more constructive than classical proofs. For example, in classical logic, when we 
prove a disjunction P V Q, we generally can’t conclude that either P or Q is provable, as 
examplihed by P V ->P. A more interesting example involving a non-constrnctive proof of a 
disjunction will be given in Section 1.5. But, in intuitionistic logic, from a proof of P V Q, 
it is possible to extract either a proof of P or a proof or Q (and similarly for existential 
statements, see Section 1.6). This property is not easy to prove. It is a consequence of the 
normal form for intuitionistic proofs (see Section 1.7). 

In brief, besides being a fun intellectual game, intuitionistic logic is only an interesting 
alternative to classical logic if we care about the constructive nature of our proofs. But 
then, we are forced to abandon the classical two-valued truth value semantics and adopt 
other semantics such as Kripke semantics. If we do not care about the constructive nature 
of our proofs and if we want to stick to two-valued truth value semantics, then we should 
stick to classical logic. Most people do that, so don’t feel bad if you are not comfortable 
with intuitionistic logic! 

One way to gauge how intuitionisic logic differs from classical logic is to ask what kind 
of propositions need to be added to intuitionisic logic in order to get classical logic. It turns 
out that if all the propositions of the form P V ->P are considered to be axioms, then RAA 
follows from some of the rules of intuitionistic logic. 
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RAA holds in Intuitionistic logic + all axioms P V -> P. 

The proof involves a subtle use of the T-elimination and V-elimination rules which may 
be a bit puzzling. Assume, as we do when when use the proof by contradiction rule (RAA) 
that we have a deduction 



r ,-cP 

_L 



Here is the deduction tree demonstrating that RAA is a derived rule: 



px 

PU^P -p 
p 



r,^py 

Jl (T-elirn) 
p 

x,y (V-elirn) 



At first glance, the rightmost subtree 



r, -npy 
Jl (T-elirn) 

P 

appears to use RAA and our argument looks circular! But this is not so because the premise 
-i P labeled y is not discharged in the step that yields P as conclusion; the step that yields P 
is a T-elimination step. The premise -i P labeled y is actually discharged by the V-elimination 
rule (and so is the premise P labeled x). So, our argument establishing RAA is not circular 
after all! 

In conclusion, intuitionistic logic is obtained from classical logic by taking away the proof 
by contradiction rule (RAA ). In this more restrictive proof system, we obtain more construc- 
tive proofs. In that sense, the situation is better than in classical logic. The major drawback 
is that we can’t think in terms of classical truth value semantics anymore. 

Conversely, classical logic is obtained from intuitionistic logic in at least three ways: 

1. Add the proof by contradiction rule (RAA). 

2. Add the -i-i-elimination rule. 



3. Add all propositions of the form P V ->P as axioms. 
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1.5 Other Rules of Classical Logic and Examples of 
Proofs 

In classical logic, we have the de Morgan laws: 

Proposition 1.5.1 The following equivalences (de Morgan laws) are provable in classical 
logic: 



— fP A Q ) = “i P V —iQ 
(P V Q) = -i P A -i Q. 

In fact, -i(PVQ) = -1PA-1Q and (->P\/->Q) =>■ ->(P AQ) are provable in intuitionistic logic. 
The proposition (P A -> Q) => ->(P Q ) is provable in intuitionistic logic and -i(P => Q) =>■ 
(P A -i Q) is provable in classical logic. Therefore, ->(P =>■ Q ) and P A ->Q are equivalent 
in classical logic. Furthermore, P Q and ->P V Q are equivalent in classical logic and 
(-i P V Q) =>■ (P =>■ Q) is provable in intuitionistic logic. 

Proof. Here is an intuitionistic proof of (~>P \/ Q) => (P => Q): 

-,P~ P x 
JL 

Q 

x 

(-iP V Q) w P^Q 

P^Q 

(~>P V Q) => (P => Q) 

Here is a classical proof of (P => Q) =>■ (-i P V Q ): 



py Q l 
Q 

P^Q 



~nP x 

HnPvg))» -pvq 



a; RAA 

(p => g)* p 

g 

( _i ( _i p v g)) y -i p vq 



y RAA 

-pvg 



(p =>■ g) (-iP v g) 
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The other proofs are left as exercises. □ 

Propositions 1.3.4 and 1.5.1 show a property that is very specific to classical logic, namely, 
that the logical connectives =^, A, V, -> are not independent. For example, we have 
P A Q = -i(-iP V -i Q), which shows that A can be expressed in terms of V and In 
intuitionistic logic, A and V cannot be expressed in terms of each other via negation. 

The fact that the logical connectives =>, A, V, -> are not independent in classical logic 
suggests the following question: Are there propositions, written in terms of only, that are 
provable classically but not provable intuitionistically? 

The answer is yes! For instance, the proposition ((P => Q) => P) P (known as Pierce ’s 
law ) is provable classically (do it) but it can be shown that it is not provable intuitionistically. 

In addition to the proof by cases method and the proof by contradiction method, we also 
have the proof by contrapositive method valid in classical logic: 

Proof by contrapositive rule : 

-nP 

X 

P^Q 

This rule says that in order to prove an implication P => Q (from T), one may assume 
-i Q as proved, and then deduce that -iP is provable from T and —>Q. This inference rule is 
valid in classical logic because we can construct the following proof: 

r,^Q a 

^p py 

_L 

Q 

P^Q 

We will now give some explicit examples of proofs illustrating the proof principles that 
we just discussed. 

Recall that the set of integers is the set 

% = {■■■ , - 2 , - 1 , 0 , 1 , 2 , - - - } 

and that the set of natural numbers is the set 

N = {0,1,2,---}. 

(Some authors exclude 0 from N. We don’t like this discrimination against zero.) An integer 
is even if it is divisible by 2, that is, if it can be written as 2k, where k G Z. An integer 
is odd if it is not divisible by 2, that is, if it can be written as 2k + 1, where fceZ. The 
following facts are essentially obvious: 



(by-contra) 

y 
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(a) The sum of even integers is even. 

(b) The sum of an even integer and of an odd integer is odd. 

(c) The sum of two odd integers is even. 

(d) The product of odd integers is odd. 

(e) The product of an even integer with any integer is even. 

Now, we prove the following fact using the proof by cases method. 

Proposition 1.5.2 Let a,b,c be odd integers. For any integers p and q, if p and q are not 
both even, then 

ap 2 + bpq + cq 2 

is odd. 

Proof . We consider the three cases: 

1. p and q are odd. In this case as a, b and c are odd, by (d) all the products ap 2 , bpq and 
cq 2 are odd. By (c), ap 2 + bpq is even and by (b), ap 2 + bpq + cq 2 is odd. 

2. p is even and q is odd. In this case, by (e), both ap 2 and bpq are even and by (d), cq 2 
is odd. But then, by (a), ap 2 + bpq is even and by (b), ap 2 + bpq + cq 2 is odd. 

3. p is odd and q is even. This case is analogous to the previous case, except that p and 
q are interchanged. The reader should have no trouble filling in the details. 

Since all three cases exhaust all possibilities for p and q not to be both even, the proof is 
complete by the V-climination rule (applied twice). □ 

The set of rational numbers Q consists of all fractions p/q, where p,q G Z, with q ^ 0. 
We now use Proposition 1.5.2 and the proof by contradiction method to prove 

Proposition 1.5.3 Let a,b,c be odd integers. Then, the equation 

aX 2 + bX + c = 0 



has no rational solution X. 

Proof . We proceed by contradiction (by this, we mean that we use the proof by contradiction 
rule). So, assume that there is a rational solution X = p/q. We may assume that p and 
q have no common divisor, which implies that p and q are not both even. As q ^ 0, if 
aX 2 + b X + c = 0, then by multiplying by q 2 , we get 



ap 2 + bpq + cq 2 = 0. 
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However, as p and q are not both even and a,b,c are odd, we know from Proposition 1.5.2 
that ap 2 + bpq + cq 2 is odd, that is, at least 1. This contradicts the fact that p 2 + bpq + cq 2 = 0 
and thus, finishes the proof. □ 

As as example of the proof by contrapositive method, we prove that if an integer n 2 is 
even, then n must be even. 

Observe that if an integer is not even then it is odd (and vice-versa). Thus, the contra- 
positive of our statement is: If n is odd, then n 2 is odd. But, to say that n is odd is to say 
that n — 2k + 1 and then, n 2 = (2 k + l) 2 = 4 k 2 + 4k + 1 = 2(2 k 2 + 2k) + 1, which shows 
that n 2 is odd. 

A real number a £ 1 is said to be irrational if it cannot be expressed as a number in Q 
(a fraction). The reader should prove that \[2 is irrational by adapting the arguments used 
in the two previous propositions. 

Remark: Let us return briefly to the issue of constructivity in classical logic, in particular 
when it comes to disjunctions. Consider the question: are there two irrational real numbers 
a and b such that a b is rational? Here is a way to prove that this indeed the case. Consider 

the number \[2 . If this number is rational, then a = y/2 and b = \/2 is an answer to our 

question (since we already know that \[2 is irrational). Now, observe that 

(VS'V 5 = y/2^ ^ = \[2 =2 is rational! 

Thus, if \[2 ' is irrational, then a = \[2 and b = \f2 is an answer to our question. So, we 
proved that 

\^2 

(y/2 is irrational and \[2 ” is rational) or 

(V2 2 and \[2 are irrational and (y/2^)’^ 2 is rational). 

However, the above proof does not tell us whether V2 "is rational or not! 

We see one of the shortcomings of classical reasoning: certain statements (in particular, 
disjunctive or existential) are provable but their proof does provide an explicit answer. It is 
in that sense that classical logic is not constructive. 

Many more examples of non-constructive arguments in classical logic can be given. 

We now add quantifiers to our language and give the corresponding inference rules. 

1.6 Adding Quantifiers; The Proof Systems VV^ ,A,v,v,3,± , 

A/'£^ ,a,v,v,3,± 

As we mentioned in Section 1.1, atomic propositions may contain variables. The intention 
is that such variables correspond to arbitrary objects. An example is 

human (x) needs-to-drink(x). 
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Now, in mathematics, we usually prove universal statements, that is statement that hold 
for all possible “objects”, or existential statement, that is, statement asserting the existence 
of some object satisfying a given property. As we saw earlier, we assert that every human 
needs to drink by writing the proposition 

Vx(human(x) =>- needs-to-drink(x)). 

Observe that once the quantifier V (pronounced “for all” or “for every”) is applied to the 
variable x, the variable x becomes a place-holder and replacing x by y or any other variable 
does not change anything. What matters is the locations to which the outer x points to in 
the inner proposition. We say that a; is a bound variable (sometimes a “dummy variable”). 

If we want to assert that some human needs to drink we write 

3x(human(x) =>- needs-to-drink(x) ) ; 

Again, once the quantifier 3 (pronounced “there exists”) is applied to the variable x, the 
variable x becomes a place-holder. However, the intended meaning of the second proposition 
is very different and weaker than the first. It only asserts the existence of some object 
satisfying the statement 

human (x) =>■ needs-to-drink(x) . 

Statements may contain variables that are not bound by quantifiers. For example, in 

\/y parent (x, y ) 

the variable y is bound but the variable x is not. Here, the intended meaning of parent (x, y) 
is that x is a parent of y. Variables that are not bound are called free. The proposition 

\/y3x parent (x, y), 

which contains only bound variables in meant to assert that every y has some parent x. Typi- 
cally, in mathematics, we only prove statements without free variables. However, statements 
with free variables may occur during intermediate stages of a proof. 

The intuitive meaning of the statement VxP is that P holds for all possible objects x and 
the intuitive meaning of the statement 3 xP is that P holds for some object x. Thus, we see 
that it would be useful to use symbols to denote various objects. For example, if we want 
to assert some facts about the “parent” predicate, we may want to introduce some constant 
symbols (for short, constants) such as “Jean”, “Mia”, etc. and write 

parent (Jean, Mia) 

to assert that Jean is a parent of Mia. Often, we also have to use function symbols (or 
operators, constructors ), for instance, to write statement about numbers: +, *, etc. Using 
constant symbols, function symbols and variables, we can form terms, such as 

(x 2 + 1)(3 * ?/ + 2). 
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In addition to function symbols, we also use predicate symbols, which are names for atomic 
properties. We have already seen several examples of predicate symbols: “human” , “parent” . 
So, in general, when we try to prove properties of certain classes of objects (people, numbers, 
strings, graphs, etc.), we assume that we have a certain alphabet consisting of constant 
symbols, function symbols and predicate symbols. Using these symbols and an infinite 
supply of variables (assumed distinct from the variables which we use to label premises) we 
can form terms and predicate terms. We say that we have a (logical) language. Using this 
language, we can write compound statements. 

Let us be a little more precise. In a first-order language, L. in addition to the logical 
connectives, =U A, V, _L, V and 3, we have a set, L. of nonlogical symbols consisting of 

(i) A set CS of constant symbols, c i, c 2 , . . . ,. 

(ii) A set FS of function symbols, f±, / 2 , . . . ,. Each function symbol, /, has a rank, ng > 1, 
which is the number of arguments of /. 

(iii) A set PS of predicate symbols, P\, P 2 , ■ ■ ■ ,. Each predicate symbol, P, has a rank, 
np > 0, which is the number of arguments of P. Predicate symbols of rank 0 are 
propositional letters, as in earlier sections. 

(iv) The equality predicate, =, is added to our language when we want to deal with equa- 
tions. 

(v) First-order variables, fi,f 2 , . . . , used to form quantified formulae. 

The difference between function symbols and predicate symbols is that function symbols 
are interpreted as functions defined on a structure (for example, addition, +, on N), whereas 
predicate symbols are interpreted as properties of objects, that is, they take the value true 
or false. An example is the language of Peano arithmetic. L = {0, S, +, *, =}. Here, the 
intended structure is N, 0 is of course zero, S is interpreted as the function S(n ) = n + 1, 
the symbol + is addition, * is multiplication and = is equality. 

Using a first-order language, L. we can form terms, predicate terms and formulae. The 
terms over L are the following expressions: 

(i) Every variable, t, is a term; 

(ii) Every constant symbol, c G CS, is a term; 

(iii) If / G FS is a function symbol taking n arguments and T\, . . . , r n are terms already 
constructed, then f{j\, , , . , r n ) is a term. 

The predicate terms over L are the following expressions: 

(i) If P G PS is a predicate symbol taking n arguments and T\, , r n are terms already 

constructed, then P{t\, . . . , r n ) is a predicate term. When n = 0, the predicate symbol, 
P, is a predicate term called a propositional letter. 
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(ii) When we allow the equality predicate, for any two terms T\ and 72 , the expression 
7~i = T 2 is a predicate term. It is usually called an equation. 

The ( first- order ) formulae over L are the following expressions: 

(i) Every predicate term, P(ti, . . . ,r n ), is an atomic formula. This includes all proposi- 
tional letters. We also view _L (and sometimes T) as an atomic formula. 

(ii) When we allow the equality predicate, every equation, T\ — 72 , is an atomic formula. 

(iii) If P and Q are formulae already constructed, then P Q, P A Q, P V Q, ->P are 
compound formulae. We treat P = Q as an abbreviation for (P Q) A (Q =>■ P), as 
before. 

(iv) If P is a formula already constructed and t is any variable, then VtP and 3 tP are 
compound formulae. 

All this can be made very precise but this is quite tedious. Our primary goal is to explain 
the basic rules of logic and not to teach a full-fledged logic course. We hope that our intuitive 
explanations will suffice and we now come to the heart of the matter, the inference rules for 
the quantifiers. Once again, for a complete treatment, readers are referred to Gallier [18] 
van Dalen [42] or Huth and Ryan [30]. 

Unlike the rules for =^, V, A and _L, which are rather straightforward, the rules for quan- 
tifiers are more subtle due the presence of variables (occurring in terms and predicates). We 
have to be careful to forbid inferences that would yield “wrong” results and for this we have 
to be very precise about the way we use free variables. More specifically, we have to exercise 
care when we make substitutions of terms for variables in propositions. For example, say we 
have the predicate “odd”, intended to express that a number is odd. Now, we can substitute 
the term (2 y + l) 2 for x in odd(x) and obtain 

odd((2y + l) 2 ). 

More generally, if P(U,^ 2 , . . . ,t n ) is a statement containing the free variables . . . ,t n and 
if Ti, ... ,r n are terms, we can form the new statement 

P[ri/U, . . . ,r n /t n ] 

obtained by substituting the term T t for all free occurrences of the variable U, for i = 1, . . . , n. 
By the way, we denote terms by the greek letter r because we use the letter t for a variable 
and using t for both variables and terms would be confusing; sorry! 

However, if P(U, t 2 , ■ ■ . , t n ) contains quantifiers, some bad things can happen, namely, 
some of the variables occurring in some term Ti may become quantified when T t is substituted 
for ti. For example, consider 



VT3 yP(x,y,z) 
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which contains the free variable z and substitute the term x + y for z\ we get 

\/x3y P(x, y, x + y). 

We see that the variables x and y occurring in the term x + y become bound variables after 
substitution. We say that there is a “capture of variables” . 

This is not what we intended to happen! To fix this problem, we recall that bound vari- 
ables are really place holders, so they can be renamed without changing anything. Therefore, 
we can rename the bound variables x and y in Vx3y P(x, y, z ) to u and v, getting the state- 
ment \/u3v P(u, v, z) and now, the result of the substitution is 

VuBv P(u, v, x + y). 

Again, all this needs to be explained very carefuly but this can be done! 

Finally, here are the inference rules for the quantifiers, first stated in a natural deduction 
style and then in sequent style. It is assumed that we use two disjoint sets of variables for 
labeling premises (x, y, ■ ■ ■ ) and free variables (t, u, v, - ■ ■ ). As we will see, the V- introduction 
rule and the 3-elimination rule involve a crucial restriction on the occurrences of certain 
variables. Remember, variables are termsl 

Definition 1.6.1 The inference rules for the quantifiers are 
V -introduction: 



r 

p^/t] 

VtP 

Here, u must be a variable that does not occur free in any of the propositions in T or in 
VtP; the notation P[u/t\ stands for the result of substituting u for all free occurrences of t 
in P. 

V- elimination: 



r 

VtP 

P[r/t\ 

Here r is an arbitrary term and it is assumed that bound variables in P have been renamed 
so that none of the variables in r are captured after substitution. 

3- introduction : 



r 

P[r/t] 



3 tP 
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As in V-elimination, r is an arbitrary term and the same proviso on bound variables in 
P applies. 



3- elimination: 



r a ,p[u/t] x 

3 tP C 

X 

C 

Here, u must be a variable that does not occur free in any of the propositions in A, 3tP, 
or C, and all premises P[u/t\ labeled x are discharged. 

In the above rules, T or A may be empty, P , C denote arbitrary propositions constructed 
from a first-order language. L and t is any variable. The system of first-order classical logic, 
is obtained by adding the above rules to the system of propositional classical 
logic A/" C =>,V,A,J -. The system of first-order intuitionistic logic, A/7 =s> ’ v ’ a ’- L,V ’ 3 is obtained by 
adding the above rules to the system of propositional intuitionistic logic . 



Using sequents, the quantifier rules in first-order logic are expressed as follows: 



Definition 1.6.2 The inference rules for the quantifiers in Gentzen- sequent style are 



q«/t] 



(V -intro) 



VtP 



r^vtp v ' r ^P[r/t\ 

where in (V -intro), u does not occur free in T or VfP; 



( fi-elim ) 



P[r/t] 



( 3-intro ) 



3 tP z: P[u/t\,Y — > C 



(3-elim) 



T -> 3tP y ' T^C 

where in (3-elim), u does not occur free in T, 3tP, or C . Again, t is any variable. 

The variable u is called the eigenvariable of the inference. The systems A/"(^’ v ’ a ’' L ’ v ’ 3 
and A/T?f’’ v ’ A, ' L ’ V ’ 3 are defined from the systems AfGfi’ ,v,A ' ± and A /T/f’’ v,A ’" L , respectively, by 
adding the above rules. 



When we say that a proposition, P, is provable from T, we mean that we can construct 
a proof tree whose conclusion is P and whose set of premises is T, in one of the systems 
A/" ( P’ A ’ v ’ _L ’ V ’ 3 or . Therefore, as in propositional logic, when we use the word 

“provable” unqualified, we mean provable in classical logic. Otherwise, we say intuitionisti- 
cally provable . 

A first look at the above rules shows that universal formulae, VfP, behave somewhat 
like infinite conjunctions and that existential formulae, 3tP, behave somewhat like infinite 
disjunctions. 
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The V-introduction rule looks a little strange but the idea behind it is actually very simple: 
Since u is totally unconstrained, if P[u/t\ is provable (from T), then intuitively P[u/t] holds 
of any arbitrary object, and so, the statement VtP should also be provable (from T). 

The meaning of the V-elimination is that if VtP is provable (from T), then P holds for 
all objects and so, in particular for the object denoted by the term r, i.e., P[r/t] should be 
provable (from T). 

The 3-introduction rule is dual to the V-elimination rule. If P[r/t ] is provable (from T), 
this means that the object denoted by r satisfies P, so 3 tP should be provable (this latter 
formula asserts the existence of some object satisfying P, and r is such an object). 

The 3-elimination rule is reminiscent of the V-elimination rule and is a little more tricky. 
It goes as follows: Suppose that we proved 3 tP (from T). Moreover, suppose that for every 
possible case, P[u/t\, we were able to prove C (from T). Then, as we have “exhausted” all 
possible cases and as we know from the provability of 3tP that some case must hold, we can 
conclude that C is provable (from T) without using P[u/t] as a premise. 

Like the V-elimination rule, the 3-elimination rule is not very constructive. It allows 
making a conclusion ( C ) by considering alternatives without knowing which actually occurs. 

Remark: Anagolously to disjunction, in (first-order) intuit ionistic logic, if an existential 
statement 3 tP is provable (from T), then from any proof of 3tP, some term, r, can be 
extracted so that P[r/t] is provable from T. Such a term, r, is called a witness. The witness 
property is not easy to prove. It follows from the fact that intuitionistic proofs have a normal 
form (see Section 1.7). However, no such property holds in classical logic (for instance, see 
the a b rational with a,b irrational example revisited below). 

Here is an example of a proof in the system A/" C =>,V ’ A ’ _L ’ V ’ 3 (actually, in _a/t > ’ v ,a, j -,v, 3) 
formula Vt(P A Q) => VtP A VtQ. 

Vt(P A Q) x Vt(P A Q) x 

P[u/t] A Q[u/t] P[u/t ] A Q[u/t] 

P[u/t ] Q[u/t ] 

VtP VtQ 

VtP A VtQ 

X 

Vt(P A Q) => VtP A VtQ 

In the above proof, u is a new variable, i.e., a variable that does not occur free in P or Q. 

The reader should show that VtP A VtQ Vt(P A Q) is also provable in AfP’ v ’ aHW. 
However, in general, one can’t just replace V by 3 (or A by V) and still obtain provable 
statements. For example, 3tP A 3tQ 3t(P A Q) is not provable at all! 

Here are some useful equivalences involving quantifiers. The first two are analogous to 
the de Morgan laws for A and V. 
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Proposition 1.6.3 The following equivalences are provable in classical first-order logic: 

-NtP = 3t~ >P 
StP = Vt^P 
Vt(P A Q) = VtP A VtQ 
3f(P V Q ) = 3 tP V 3 tQ. 

In fact, the last three and 3t~>P =$■ ~NtP are provable intuitionistically. Moreover, the 
propositions 3t(P AQ) 3tPA3tQ and\/tP\/\/tQ =>- Vf(PVQ) are provable in intuitionistic 
first-order logic (and thus, also in classical first-order logic). 

Proof . Left as an exercise to the reader. □ 

Remark: We can illustrate, again, the fact that classical logic allows for non-constructive 

\/2 

proofs by reexamining the example at the end of Section 1.3. There, we proved that if \(2 
is rational, then a = \/2 and b = \/2 are both irrational numbers such that a b is rational 

^2 y /2 

and if y/2 is irrational then a = y/2 and b = y/2 are both irrational numbers such that 

^/2 

a b is rational. By 3-introduction, we deduce that if y/2 is rational then there exist some 

irrational numbers a, b so that a b is rational and if \[2 ~ is irrational then there exist some 
irrational numbers a, b so that a b is rational. In classical logic, as P V ->P is provable, by 
V-elimination, we just proved that there exist some irrational numbers a and b so that a b is 
rational. 

ffowever, this argument does not give us explicitely numbers a and b with the required 

y /2 

properties! ft only tells us that such numbers must exist. Now, it turns out that \(2 
is indeed irrational (this follows from the Gel’fond-Schneider Theorem, a hard theorem in 
number theory). Furthermore, there are also simpler explicit solutions such as a = \/2 and 
b = log 2 9, as the reader should check! 

We conclude this section by giving an example of a “wrong proof” . Here is an example 
in which the V-introduction rule is applied illegally, and thus, yields a statement which is 
actually false (not provable). In the incorrect “proof” below, P is an atomic predicate symbol 
taking two arguments (for example, “parent”) and 0 is a constant denoting zero: 

P(t,0) x 

illegal step! 

VtP(t, 0) 

X 

Pit , 0) VtP(t, 0) 

Vt(P(t, 0) =*► VtP(t,0)) 

P(0,0) =*► VtP(t,0) 

The problem is that the variable t occurs free in the premise P[t/t, 0] = P(f, 0) and 
therefore, the application of the V-introduction rule in the hrst step is illegal. However, 
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note that this premise is discharged in the second step and so, the application of the V- 
introduction rule in the third step is legal. The (false) conclusion of this faulty proof is that 
P(0, 0) =>■ \/tP(t, 0) is provable. Indeed, there are plenty of properties such that the fact 
that the single instance, P(0, 0), holds does not imply that P(t, 0) holds for all t. 



Remark: The above example shows why it is desirable to have premises that are universally 
quantified. A premise of the form \/t.P can be instantiated to P[u/t], using V-elimination, 
where u is a brand new variable. Later on, it may be possible to use V-introduction without 
running into trouble with free occurrences of u in the premises. But we still have to be very 
careful when we use V-introduction or 3-elimination. 

Before concluding this section, let us give a few more examples of proofs using the rules 
for the quantifiers. First, let us prove that 

\/tP = VuP[u/t], 

where u is any variable not free in \/t.P and such that u is not captured during the substitution. 
This rule allows us to rename bound variables (under very mild conditions). We have the 
proofs 



( VfP) Q 

\/uP[u/t] 

VfP => VuP[u/t] 



and 



0 iuP[u/t]) a 
P[u/t] 

MtP 

VuP[u/t } => VtP 



Now, we give a proof (intuitionistic) of 

3 f(P Q) => (VtP Q ), 



where t does not occur (free or bound) in Q. 




46 CHAPTER 1. MATHEMATICAL REASONING, PROOF PRINCIPLES AND LOGIC 



(ytp)y 

(. p[u/t\ => QY P[u/t] 

(3 t(P =► Q)T Q 

X 

Q 

y 

VtP => Q 

z 

3 t{P =>• Q) =>■ (VtP =4* Q) 

In the above proof, u is a new variable that does not occur in Q, VtP , or 3 t(P => Q ) 
The converse requires (RAA) and is a bit more complicated. To conclude, we give a proof 
(intuitionistic) of 

(VtP VQ) =► Vt(P VQ), 
where t does not occur (free or bound) in Q. 

(ytpy 

P[u/t\ Q y 

P[u/t] v Q P[ u /A v Q 
(VtP v g) z vt(p v q) vt(p v q) 

x,y 

Vt(P V Q) 

z 

(VtP V Q) =}► Vt(P V Q) 

In the above proof, u is a new variable that does not occur in VtP or Q. The converse 
requires (RAA). 

Several times in this Chapter, we have claimed that certain propositions are not provable 
in some logical system. What kind of reasoning do we use to validate such claims? In the 
next section, we briefly address this question as well as related ones. 

1.7 Decision Procedures, Proof Normalization, 
Counter-Examples, Theories, etc. 

In the previous sections, we saw how the rules of mathematical reasoning can be formalized 
in various natural deduction systems and we defined a precise notion of proof. We observed 
that finding a proof for a given proposition was not a simple matter, nor was it to acertain 
that a proposition is improvable. Thus, it is natural to ask the following question: 

The Decision Problem'. Is there a general procedure which takes any arbitrary proposi- 
tion, P, as input, always terminates in a finite number of steps, and tells us whether P is 
provable or not. 
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Clearly, it would be very nice if such a procedure existed, especially if it also produced a 
proof of P when P is provable. 

Unfortunately, for rich enough languages, such as first-order logic, it is impossible to 
find such a procedure. This deep result known as the undecidability of the decision prob- 
lem or Church’s Theorem was proved by A. Church in 1936 (Actually, Church proved the 
undecidability of the validity problem, but by Godel’s completeness Theorem, validity and 
provability are equivalent). 

Proving Church’s Theorem is hard and a lot of work. One needs to develop a good deal 
of what is called the theory of computation. This involves defining models of computation 
such as Turing machines and proving other deeps results such as the undecidability of the 
halting problem and the undecidability of the Post Correspondence Problem, among other 
things. Some of this material is covered in CSE262, so be patient and your curiosity will be 
satisfied! 

So, our hopes to find a “universal theorem prover” are crushed. However, if we restrict 
ourselves to propositional logic, classical or intuitionistic, it turns out that procedures solving 
the decision problem do exist and they even produce a proof of the input proposition when 
that proposition is provable. 

Unfortunately, proving that such procedures exist and are correct in the propositional case 
is rather difficult, especially for intuitionistic logic. The difficulties have a lot to do with our 
choice of a natural deduction system. Indeed, even for the system (or A fQff), provable 

propositions may have infinitely many proofs. This makes the search process impossible; 
when do we know how to stop, especially if a proposition is not provable! The problem is that 
proofs may contain redundancies (Gentzen said “detours”). A typical example of redundancy 
is an elimination immediately follows an introduction, as in the following example in which 
V i denotes a deduction with conclusion T,a;: A^B and V 2 denotes a deduction with 
conclusion T — > A. 



V, 

T,a:: A — > B x> 2 

T -> A ^ B T -> A 

r -»• b 

Intuitively, it should be possible to construct a deduction for T — > B from the two deduc- 
tions T>\ and V 2 without using at all the hypothesis x : A. This is indeed the case. If we look 
closely at the deduction T>i, from the shape of the inference rules, assumptions are never 
created, and the leaves must be labeled with expressions of the form T r , A, x : A,y : C — > C or 
T, A, x : A — > A, where y ^ x and either T = T' or T — Y',y: C. We can form a new deduc- 
tion for T —y B as follows: in D \ , wherever a leaf of the form T, A, x: A — ► A occurs, replace 
it by the deduction obtained from V 2 by adding A to the premise of each sequent in V 2 . 
Actually, one should be careful to first make a fresh copy of T> 2 by renaming all the variables 
so that clashes with variables in T>\ are avoided. Finally, delete the assumption x : A from 
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the premise of every sequent in the resulting proof. The resulting deduction is obtained by 
a kind of substitution and may be denoted as D \ [D 2 / x \ , with some minor abuse of notation. 
Note that the assumptions x : A occurring in the leaves of the form T', A,x: A,y: C — > C 
were never used anyway. The step which consists in transforming the above redundant proof 
figure into the deduction T>\ [D 2 /x\ is called a reduction step or normalization step. 

The idea of proof normalization goes back to Gentzen ([20], 1935). Gentzen noted that 
(formal) proofs can contain redundancies, or “detours” , and that most complications in the 
analysis of proofs are due to these redundancies. Thus, Gentzen had the idea that the analysis 
of proofs would be simplified if it was possible to show that every proof can be converted to 
an equivalent irredundant proof, a proof in normal form. Gentzen proved a technical result 
to that effect, the “cut-elimination theorem”, for a sequent-calculus formulation of first-order 
logic [20]. Cut- free proofs are direct, in the sense that they never use auxiliary lemmas via 
the cut rule. 

Remark: It is important to note that Gentzen’s result gives a particular algorithm to pro- 
duce a proof in normal form. Thus, we know that every proof can be reduced to some normal 
form using a specific strategy, but there may be more than one normal form, and certain 
normalization strategies may not terminate. 

About thirty years later, Prawitz ([35], 1965) reconsidered the issue of proof normal- 
ization, but in the framework of natural deduction rather than the framework of sequent 
calculi. 1 Prawitz explained very clearly what redundancies are in systems of natural deduc- 
tion, and he proved that every proof can be reduced to a normal form. Furthermore, this 
normal form is unique. A few years later, Prawitz ([36], 1971) showed that in fact, every 
reduction sequence terminates, a property also called strong normalization. 

A remarkable connection between proof normalization and the notion of computation 
must also be mentioned. Curry (1958) made the remarkably insightful observation that 
certain typed combinators can be viewed as representations of proofs (in a Hilbert system) 
of certain propositions (See in Curry and Feys [12] (1958), Chapter 9E, pages 312-315.) 
Building up on this observation, Howard ([29], 1969) described a general correspondence 
between propositions and types, proofs in natural deduction and certain typed A-terms, 
and proof normalization and /3-reduction. (The simply-typed- A-calculus was invented by 
Church, 1940). This correspondence, usually referred to as the Curry /Howard isomorphism 
or formulae- as -types principle, is fundamental and very fruitful. 

The Curry/Howard isomorphism establishes a deep correspondence between the notion 
of proof and the notion of computation. Furthermore, and this is the deepest aspect of 
the Curry/Howard isomorphism, proof normalization corresponds to term reduction in the 
A-calculus associated with the proof system. To make the story short, the correspondence 
between proofs in intuitionistic logic and typed A-terms on one-hand and between proof 
normalization and /3-conversion on the other hand can be used to translate results about 

lr This is somewhat ironical, since Gentzen began his investigations using a natural deduction system, but 
decided to switch to sequent calculi (known as Gentzen systems!) for technical reasons. 
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typed A-terms into results about proofs in intuitionistic logic. By the way, some aspects of 
the Curry/Howard isomorphism are covered in CIS500. 

In summary, using either some suitable intuitionistic sequent calculi and Gentzen’s cut 
elimination theorem or some suitable typed A-calculi and (strong) normalization results 
about them, it is possible to prove that there is a decision procedure for propositional intu- 
itionistic logic. However, it can also be shown that the time-complexity of any such procedure 
is very high. Here, we are alluding to complexity theory , another active area of computer 
science. You will learn about some basic and fundamental aspects of this theory in CSE262 
when you learn about the two problems P and NP. 

Readers who wish to learn more about these topics can read my two survey papers Gallier 
[17] (on the Correspondence Between Proofs and A- Terms) and Gallier [16] (A Tutorial on 
Proof Systems and Typed A-Calculi), both available on the web site 

http: //www.cis. upenn.edu/~jean/gbooks/logic.html 

and the excellent introduction to proof theory by Troelstra and Schwichtenberg [41]. 

Anybody who really wants to understand logic should of course take a look at Kleene 
[31] (the famous “I.M.”), but this is not recommended to beginners! 

Let us return to the question of deciding whether a proposition is not provable. To 
simplify the discussion, let us restrict our attention to propositional classical logic. So far, 
we have presented a very proof-theoretic view of logic, that is, a view based on the notion 
of provability as opposed to a more semantic view of based on the notions of truth and 
models. A possible excuse for our bias is that, as Peter Andrews (from CMU) puts it, 
“truth is elusive”. Therefore, it is simpler to understand what truth is in terms of the more 
“mechanical” notion of provability. (Peter Andrews even gave the subtitle 

To Truth Through Proof 
to his logic book Andrews [1]!) 

However, mathematicians are not mechanical theorem provers (even if they prove lots of 
stuff)! Indeed, mathematicians almost always think of the objects they deal with (functions, 
curves, surfaces, groups, rings, etc.) as rather concrete objects (even if they may not seem 
concrete to the uninitiated) and not as abstract entities soleley characterized by arcane 
axioms. 

It is indeed natural and fruitful to try to interpret formal statements semantically. For 
propositional classical logic, this can be done quite easily if we interpret atomic propositional 
letters using the truth values true and false. Then, the crucial point that every provable 
proposition ( say in J\fQ^’ w,A,± ) has the value true no matter how we assign truth values to 
the letters in our proposition. In this case, we say that P is valid. 

The fact that provable implies valid is called soundness or consistency of the proof system. 
The soundness of the proof system A/’(7^ > ' V,A ' ± is easy to prove. For this, given any sequent, 
r — > P, we prove that whenever all the propositions in T are assigned the value true, then P 
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evaluates to true. This is easy to do: check that this holds for the axioms and that whenever 
it holds for the premise(s) of an inference rule then it holds for the conclusion. 

We now have a method to show that a proposition, P , is not provable: Find some truth 
assignment that makes P false. 

Such an assignment falsifying P is called a counter-example. If P has a counter-example, 
then it can’t be provable because if it were, then by soundness it would be true for all 
possible truth assigments. 

But now, another question comes up: If a proposition is not provable, can we always find 
a counter-example for it. Equivalently, is every valid proposition provable*? If every valid 
proposition is provable, we say that our proof system is complete (this is the completeness 
of our system). 

The system is indeed complete. In fact, all the classical systems that we 

have discussed are sound and complete. Completeness is usually a lot harder to prove than 
soundness. For first-order classical logic, this is known as Go del’s completeness Theorem 
(1929). Again, we refer our readers to Gallier [18] van Dalen [42] or or Huth and Ryan 
[30] for a thorough discussion of these matters. In the first-order case, one has to define 
first-order structures (or first-order models ). 

What about intuit ionistic logic? 

Well, one has to come up with a richer notion of semantics because it is no longer true 
that if a proposition is valid (in the sense of our two- valued semantics using true, false), 
then it is provable. Several semantics have been given for intuitionistic logic. In our opinion, 
the most natural is the notion of Kripke model. Then, again, soundness and completeness 
holds for intuitionistic proof systems, even in the first-order case (see van Dalen [42]). 

In summary, semantic models can be use to provide counter-examples of improvable 
propositions. This is a quick method to establish that a proposition is not provable. 

The way we presented deduction trees and proof trees may have given our readers the 
impression that the set of premises, T, was just an auxiliary notion. Indeed, in all of our 
examples, T ends up being empty! However, nonempty T’s are crucially needed if we want to 
develop theories about various kinds of structures and objects, such as the natural numbers, 
groups, rings, fields, trees, graphs, sets, etc. Indeed, we need to make definitions about the 
objects we want to study and we need to state some axioms asserting the main properties 
of these objects. We do this by putting these definitions and axioms in T. Actually, we have 
to allow T to be infinite but we still require that our deduction trees are finite; they can 
only use finitely many of the propositions in T. We are then interested in all propositions, 
P, such that A — > P is provable, where A is any finite subset of T; the set of all such P’s is 
called a theory. Of course we have the usual problem of consistency: If we are not careful, 
our theory may be inconsistent, i.e. , it may consist of all propositions. 

Let us give two examples of theories. 
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Our first example is the theory of equality. Indeed, our readers may have noticed that 
we have avoided to deal with the equality relation. In practice, we can’t do that. 

Given a language, L. with a given supply of constant, function and predicate symbols, 
the theory of equality consists of the following formulae taken as axioms: 

V(x = x) 

Vxi • • • Vx n Vyi ■ ■■\/y n [{xi = yi A • • • A x n = y n ) => f(x i, . . . ,x n ) = f(yi, . . . ,y n )} 

Vxi • • -Va; n V 2 /i • ■ -Vj/ n [(a:i = yi A - ■ ■ Ax n — y n ) A P(aq,. . . ,x n ) => P(j/i, • • • 

for all function symbols (of n arguments) and all predicate symbols (of n arguments), in- 
cluding the equality predicate, =, itself. 

It is not immediately clear from the above axioms that = is reflexive and transitive but 
this can shown easily. 

Our second example is the first-order theory of the natural numbers known as Peano ’s 
arithmetic. 

Here, we have the constant 0 (zero), the unary function symbol S (for successor function; 
the intended meaning is S(n) — n + 1) and the binary function symbols + (for addition) 
and * (for multiplication). In addition to the axioms for the theory of equality we have the 
following axioms: 



\/x~>(S(x) = 0) 

Vx\/y(S(x) = S(y) => x = y) 

\/x\/y(x + 0 = x) 

\/x\/y(x + S(y) = S(x + y)) 

VxVy(x *0 = 0) 

WxWy(x * S(y) = x * y + x) 

[H(0) A \/x(A(x) =>• H(5'(a;)))] VnH(n), 

where A is any first-order formula with one free variable. This last axiom is the induction 
axiom. Observe how + and * are defined recursively in terms of 0 and S and that there are 
infinitely many induction axioms (countably many). 

Many properties that hold for the natural numbers (i.e. , are true when the symbols 
0, S, T, * have their usual interpretation and all variables range over the natural numbers) 
can be proved in this theory (Peano’s arithmetic), but not all! This is another very famous 
result of Godel known as Godel’s incompleteness Theorem (1931). However, the topic of 
incompleteness is definitely oustside the scope of this course, so we will not say anymore about 
it. Another very interesting theory is set theory. There are a number of axiomatizations of 
set theory and we will discuss one of them (ZF) very briefly in the next section. 

We close this section by repeating something we said ealier: There isn’t just one logic 
but instead, many logics. In addition to classical and intuitionistic logic (propositional and 
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first-order), there are: modal logics, higher-order logics and linear logic, a logic due to Jean- 
Yves Girard, attempting to unify classical and intuitionistic logic (among other goals). An 
excellent introduction to these logics can be found in Troelstra and Schwichtenberg [41]. We 
warn our readers that most presentations of linear logic are (very) difficult to follow. This is 
definitely true of Girard’s seminal paper [22], A more approachable version can be found in 
Girard, Lafont and Taylor [21], but most readers will still wonder what hit them when they 
attempt to read it. 

In computer science, there is also dynamic logic , used to prove properties of programs 
and temporal logic and its variants (originally invented by A. Pnucli), to prove properties of 
real-time systems. So, logic is alive and well! Also, take a look at CSE482! 



1.8 Basics Concepts of Set Theory 

Having learned some fundamental notions of logic, it is now a good place before proceeding 
to more interesting things, such as functions and relations, to go through a very quick review 
of some basic concepts of set theory. This section will take the very “naive” point of view 
that a set is a collection of objects, the collection being regarded as a single object. Having 
first-order logic at our disposal, we could formalize set theory very rigorously in terms of 
axioms. This was done by Zermelo first (1908) and in a more satisfactory form by Zermelo 
and Frankel in 1921, in a theory known as the “Zermelo-Frankel” (ZF) axioms. Another 
axiomatization was given by John von Neumann in 1925 and later improved by Bernays in 
1937. A modification of Bernay’s axioms was used by Kurt Godel in 1940. This approach 
is now known as “von Neumann- Bernays” (VNB) or “Godel-Bernays” (GB) set theory. 
There are many books that give an axiomatic presentation of set theory. Among them, we 
recommend Enderton [14], which we find remarkably clear and elegant, Suppes [40] (a little 
more advanced) and Halmos [27], a classic (at a more elementary level). 

However, it must be said that set theory was first created by Georg Cantor (1845-1918) 
between 1871 and 1879. However, Cantor’s work was not unanimously well received by 
all mathematicians. Cantor regarded infinite objects as objects to be treated in much the 
same way as finite sets, a point of view that was shocking to a number of very prominent 
mathematicians who bitterly attacked him (among them, the powerful Kronecker). Also, 
it turns out that some paradoxes in set theory popped up in the early 1900, in particular, 
Russell’s paradox. Russell’s paradox (found by Russell in 1902) has to to with the 

“set of all sets that are not members of themselves” 
which we denote by 

R = {x | x ^ a:}. 

(In general, the notation {x \ P} stand for the set of all objects satisfying the property P.) 

Now, classically, either R G R or R ^ R. However, if R G R, then the definition of R 
says that R ^ R; if R ^ R, then again, the definition of R says that R E Rl 
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So, we have a contradiction and the existence of such a set is a paradox. The problem 
is that we are allowing a property (here, P(x) = x ^ x), which is “too wild” and circular 
in nature. As we will see, the way out, as found by Zermclo, is to place a restriction on the 
property P and to also make sure that P picks out elements from some already given set 
(see the Subset Axioms below). 

The apparition of these paradoxes prompted mathematicians, with Hilbert among its 
leaders, to put set theory on firmer grounds. This was achieved by Zermelo, Frankel, von 
Neumann, Bernays and Godel, to only name the major players. 

In what follows, we are assuming that we are working in classical logic. We will introduce 
various operations on sets using defintion involving the logical connectives A, V, -i, V and 
3. In order to ensure the existence of some of these sets requires some of the axioms of set 
theory, but we will be rather casual about that. 

Given a set, A, we write that some object, a, is an element of (belongs to) the set A as 

<x G A 

and that a is not an element of A (does not belong to A) as 

o ^ A. 

When are two sets A and B equal? This corresponds to the first axiom of set theory, 
called 

Extensionality Axiom 

Two sets A and B are equal iff they have exactly the same elements, that is 

\/x(x G A =>- x G B) A \/x(x GBAiGi). 

The above says: Every element of A is an element of B and conversely. 

There is a special set having no elements at all, the empty set , denoted 0. This is the 

Empty Set Axiom 

There is a set having no members. This set is denoted 0. 

Remark: Beginners often wonder whether there is more than one empty set. For example, 
is the empty set of professors distinct from the empty set of potatoes? 

The answer is, by the extensionality axiom, there is only one empty set! 

Given any two objects a and b, we can form the set {a, b} containing exactly these two 
objects. Amazingly enough, this must also be an axiom: 

Pairing Axiom 
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Given any two objects a and b (think sets), there is a set, {a, b}, having as members just 
a and b. 

Observe that if a and b are identical, then we have the set {a, a}, which is denoted by 
{a} and is called a singleton set (this set has a as its only element). 

To form bigger sets, we use the union operation. This too requires an axiom. 

Union Axiom (Version 1) 

For any two sets A and B, there is a set, A U B, called the union of A and B defined by 

x E A U B iff (x E A) V (x E B). 

This reads, a: is a member of A U B if either x belongs to 4 or a: belongs to B (or both). We 
also write 

AI)B = {x\xeA or x E B}. 

Using the union operation, we can form bigger sets by taking unions with singletons. For 
example, we can form 

{a, b, c} = {a, 6} U {c}. 

Remark: We can systematically construct bigger and bigger sets by the following method: 
Given any set, A, let 

A + = All {A}. 

If we start from the empty set, we obtain sets that can be used to define the natural numbers 
and the + operation corresponds to the successor function on the natural numbers, i.e., 
n i — * n T 1. 

Another operation is the power set formation. It is indeed a “powerful” operation, in 
the sense that it allows us to form very big sets. For this, it is helpful to define the notion 
of inclusion between sets. Given any two sets, A and B , we say that A is a subset of B (or 
that A is included in B ), denoted A C B, iff every element of A is also an element of B , i.e. 

Vx(x E A => x E B). 

We say that A is a proper subset of B iff A C B and 4 ^ B. This implies that that there is 
some b E B with b ^ A. We usually write A C B. 

Observe that the equality of two sets can be expressed by 

A = B iff AC B and B C A. 



Power Set Axiom 

Given any set, A, there is a set, V(A), (also denoted 2 A ) called the power set of A whose 
members are exactly the subsets of A, i.e., 



X E V(A) iff X C A. 
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For example, if A = {a, b, c}, then 

V(A) = {0, {a}, {&}, {c}, {a, b}, {a, c}, {b, c}, {a, b , c}}, 

a set containing 8 elements. Note that the empty set and A itself are always members of 
V(A). 

Remark: If A has n elements, it is not hard to show that V(A) has 2 n elements. For this 
reason, many people, including me, prefer the notation 2 A for the power set of A. 

At this stage, we would like to define intersection and complementation. For this, given 
any set, A, and given a property, P, (specified by a first-order formula) we need to be able 
to define the subset of A consisting of those elements satisfying P. This subset is denoted 
by 

{x G A | P}. 

Unfortunately, there are problems with this construction. If the formula, P , is somehow a 
circular definition and refers to the subset that we are trying to define, then some paradoxes 
may arise! 

The way out is to place a restriction on the formula used to define our subsets, and 
this leads to the subset axioms, first formulated by Zermelo. These axioms are also called 
comprehension axioms or axioms of separation. 

Subset Axioms 

For every first-order formula, P , we have the axiom: 

VA3X\/x(x e X iff (x g A) a P), 

where P does not contain X as a free variable. (However, P may contain x free.) 

The subset axiom says that for every set, A, there is a set, X, consisting exactly of those 
elements of A so that P holds. For short, we usually write 

X = {xeA \ P}. 

As an example, consider the formula 

P(B, x) = x G B. 



Then, the subset axiom says 



WA3XWx(x G A Axe B), 

which means that A" is the set of elements that belong both to A and B. This is called the 
intersection of A and B, denoted by A fl B. Note that 



AnB = {x\xeA and x G B}. 
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We can also define the relative complement of B in A, denoted A — B, given by the 
formula P(x, B) = x B, so that 

A — B = {x \ x E A and x B}. 

In particular, if A is any given set and B is any subset of A, the set A — B is also denoted 
B and is called the complement of B. Because A, V and -i satisfy the de Morgan laws 
(remember, we are dealing with classical logic), for any set X, the operations of union, 
intersection and complementation on subsets of X satisfy various identities, in particular 
the de Morgan laws 

An B = AUB 
AUB =AnB 
A = A, 

and various associativity, commutativity and distributivity laws. 

So far, the union axiom only applies to two sets but later on we will need to form infinite 
unions. Thus, it is necessary to generalize our union axiom as follows: 

Union Axiom (Final Version) 

Given any set X (think of X as a set of sets), there is a set, (J X, defined so that 

x e U A iff 3 B(B e X A x G B). 

This says that (J X consists of all elements that belong to some member of X. 

If we take X = { A , B}, where A and B are two sets, we see that 

\J{A,B} = A0B, 

and so, our final version of the union axiom subsumes our previous union axiom which we 
now discard in favor of the more general version. 

Observe that 

= A, [J{Ai, . . . , A n } = A\ U • • • U A n . 

and in particular, (J0 = 0. 

Using the subset axiom, we can also define infinite intersections. For every nonempty 
set, X, there is a set, fj X, defined by 

x e p| x iff VB(B eX^xeB). 



The existence of f) V is justified as follows: Since X is nonempty, it contains some set, 
A; let 

P(X, x) = \/B(B eX^xeB). 
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Then, the subset axiom asserts the existence of a set Y so that for every x. 

x G Y iff x G A and P( X, x) 



which is equivalent to 

xeY iff P(X,x). 

Therefore, the set Y is our desired set, nx. 

Observe that 

n {A, B} = AnB , Pi {A, ...,A n } = A 1 n---nA n . 

Note that P| 0 is not defined. Intuitively, it would have to be the set of all sets, but such a 
set does not exist, as we now show. This is basically a version of Russell’s paradox. 



Theorem 1.8.1 (Russell) There is no set of all sets, i.e., there is no set to which every 
other set belongs. 

Proof. Let A be any set. We construct a set, B , that does not belong to A. If the set of all 
sets existed, then we could produce a set that does not belong to it, a contradiction. Let 

B — {a e A | a £ a}. 

We claim that B ^ A. We proceed by contradiction, so assume B e A. However, by the 
definition of B, we have 

B e B iff B e A and B <£ B. 

Since B e A, the above is equivalent to 

B e B iff B£B, 

which is a contradiction. Therefore, B ef A and we deduce that there is no set of all sets. □ 

Remarks: 

(1) We should justify why the equivalence RGRiffR^Risa contradiction. What we 
mean by “a contradiction” is that if the above equivalence holds, then we can derive _L 
(falsity) and thus, all propositions become provable. This is because we can show that 
for any proposition, P, if P = -i P is provable, then ->(P = -> P) is also provable. We 
leave the proof of this fact as an easy exercise for the reader. By the way, this holds 
classically as well as intuitionistically. 
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(2) We said that in the subset axiom, the variable X is not allowed to occur free in 
P. A slight modification of Russell’s paradox shows that allowing X to be free in 
P lead to paradoxical sets. For example, pick A to be any nonempty set and set 
P(X,x) = x ^ X. Then, look at the (alleged) set 

X = {x e A I X i X}. 

As an exercise, the reader should show that X is empty iff X is nonempty! 

This is as far as we can go with the elementary notions of set theory that we have 
introduced so far. In order to proceed further, we need to define relations and functions, 
which is the object of the next Chapter. 

The reader may also wonder why we have not yet discussed infinite sets. This is because 
we don’t know how to show that they exist! Again, perhaps surprinsingly, this takes another 
axiom, the axiom of infinity. We also have to define when a set is infinite. However, we will 
not go into this right now. Instead, we will accept that the set of natural numbers, N, exists 
and is infinite. Once, we have the notion of a function, we will be able to show that other 
sets are infinite by comparing their “size” with that of Id (This is the purpose of cardinal 
numbers, but this would lead us too far afield). 

Remark: In an axiomatic presentation of set theory, the natural numbers can be defined 
from the empty set using the operation A i— ► A + = A U {A} introduced just after the union 
axiom. The idea due to von Neumann is that 

0 = 0 

1 = 0 + = { 0 } = { 0 } 

2 = 1 + = { 0 , { 0 }} = { 0 , 1 } 

3 = 2 + = { 0 ,{ 0 },{ 0 ,{ 0 }}} = { 0 , 1 , 2 } 

n+1 = n + — {0, 1, 2, . . . , n} 



However, the above subsumes induction! Thus, we have to proceed in a different way to 
avoid circularities. 

Definition 1.8.2 We say that a set, X, is inductive iff 

(1) 0 G X; 

(2) For every A e X, we have A + e X. 
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Axiom of Infinity 

There is some inductive set. 

Having done this, we make the 

Definition 1.8.3 A natural number is a set that belongs to every inductive set. 

Using the subset axioms, we can show that there is a set whose members are exactly 
the natural numbers. The argument is very similar to the one used to prove that arbitrary 
intersections exist. By the Axiom of infinity, there is some inductive set, say A. Now consider 
the property, P(x), which asserts that x belongs to every inductive set. By the subset axioms 
applied to P, there is a set, N, such that 

itN iff x E A and P(x) 

and since A is inductive and P says that x belongs to every inductive set, the above is 
equivalent to 

xeN iff P(x), 

that is, x G N iff x belongs to every inductive set. Therefore, the set of all natural numbers, 
N, does exist. The set N is also denoted u. We can now easily show 

Theorem 1.8.4 The set N is inductive and it is a subset of every inductive set. 

Proof. Recall that 0 belongs to every inductive set; so, 0 is a natural number (0). As N is 
the set of natural numbers, 0 (= 0) belongs to N. Secondly, if n e N, this means that n 
belongs to every inductive set (n is a natural number), which implies that n + — n+1 belongs 
to every inductive set, which means that n + 1 is a natural number, i.e., n + 1 e N. Since 
N is the set of natural numbers and since every natural number belongs to every inductive 
set, we conclude that N is a subset of every inductive set. □ 

It would be tempting to view N as the intersection of the family of inductive sets, but 
unfortunately this family is not a set; it is too “big” to be a set. 

As a consequence of the above fact, we obtain the 

Induction Principle for N: Any inductive subset of N is equal to N itself. 

Now, in our setting, 0 = 0 and n + — n + 1, so the above principle can be restated as 
follows: 

Induction Principle for N (Version 2): For any subset, S C N, if 0 G S and n + 1 e S 
whenever n e S, then S = N. 

We will see how to rephrase this induction principle a little more conveniently in terms 
of the notion of function in the next chapter. 



Remarks: 
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1. We still don’t know what an infinite set is or, for that matter, that N is infinite! This 
will be shown in the next Chapter (see Corollary 2.9.7). 

2. Zermelo-Frankel set theory (+ Choice) has three more axioms that we did not discuss: 
The Axiom of Choice , the Replacement Axioms and the Regularity Axiom. For our 
purposes, only the Axiom of Choice will be needed and we will introduce it in Chapter 
2. Let us just say that the Replacement Axioms are needed to deal with ordinals and 
cardinals and that the Regularity Axiom is needed to show that every set is grounded. 
For more about these axioms, see Enderton [14], Chapter 7. The Regularity Axiom 
also implies that no set can be a member of itself, an eventuality that is not ruled out 
by our current set of axioms! 




Chapter 2 



Relations, Functions, Partial 
Functions 

2.1 What is a Function? 

We use functions all the time in Mathematics and in Computer Science. But, what exactly 
is a function? 

Roughly speaking, a function, /, is a rule or mechanism, which takes input values in 
some input domain, say A", and produces output values in some output domain , say Y , in 
such a way that to each input x e X corresponds a unique output value y G Y , denoted 
f(x). We usually write y — f(x), or better, x > f(x). 

Often, functions are defined by some sort of closed expression (a formula), but not always. 
For example, the formula 

y = 2x 

defines a function. Here, we can take both the input and output domain to be M, the set of 
real numbers. Instead, we could have taken N, the set of natural numbers; this gives us a 
different function. In the above example, 2x makes sense for all input x, whether the input 
domain is N or M, so our formula yields a function defined for all of its input values. 

Now, look at the function defined by the formula 



x 




If the input and output domains are both M, again this function is well-defined. However, 
what if we assume that the input and output domains are both N? This time, we have a 
problem when x is odd. For example, | is not an integer, so our function is not defined for 
all of its input values. It is a partial function. Observe that this function is defined for the 
set of even natural numbers (sometimes denoted 2N) and this set is called the domain (of 
definition) of /. If we enlarge the output domain to be Q, the set of rational numbers, then 
our function is defined for all inputs. 
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Another example of a partial function is given by 

x + 1 

^ x 2 — 3x + 2 ’ 

assuming that both the input and output domains are M. Observe that for x — 1 and x = 2, 
the denominator vanishes, so we get the undefined fractions ^ and f r The function “blows 
up” for x = 1 and x = 2, its value is “infinity” (= oo), which is not an element of M. So, the 
domain of/isM — {1,2}. 

In summary, functions need not be defined for all of their input values and we need to 
pay close attention to both the input and the ouput domain of our functions. 

The following example illustrates another difficulty: Consider the function given by 

y=s/x. 

If we assume that the input domain is M and that the output domain is M + = {r 6 1 | i > 0}, 
then this function is not defined for negative values of x. To fix this problem, we can extend 
the output domain to be C, the complex numbers. Then we can make sense of y Uc when 
x < 0. However, a new problem comes up: Every negative number, x, has two complex 
square roots, —i\/—x and +i\f— x (where i is “the” square root of —1). Which of the two 
should we pick? 

In this case, we could systematically pick +i\/—x but what if we extend the input domain 
to be C. Then, it is not clear which of the two complex roots should be picked, as there is 
no obvious total order on C. We can treat / as a multi-valued function, that is, a function 
that may return several possible outputs for a given input value. 

Experience shows that it is akward to deal with multi-valued functions and that it is 
best to treat them as relations (or to change the output domain to be a power set, which is 
equivalent to view the function as a relation). 

Let us give one more example showing that it is not always easy to make sure that a 
formula is a proper definition of a function. Consider the function from R to R given by 

_°°. n 

/w = i+E^r 

n= 1 

Here, n\ is the function factorial, defined by 

n\ — n ■ (n — 1) • • • 2 • 1 . 

How do we make sense of this infinite expression? Well, that’s where analysis comes in, 
with the notion of limit of a series, etc. It turns out that f(x) is the exponential function 
f(x) = e x . Actually, e x is even defined when a: is a complex number or even a square matrix 
(with real or complex entries)! Don’t panic, we will not use such functions in this course. 
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Another issue comes up, that is, the notion of computability . In all of our examples, 
and for most functions we will ever need to compute, it is clear that it is possible to give 
a mechanical procedure, i.e., a computer program which computes our functions (even if it 
hard to write such a program or if such a program takes a very long time to compute the 
output from the input). 

Unfortunately, there are functions which, although well-defined mathematically, are not 
computable! For an example, let us go back to first-order logic and the notion of provable 
proposition. Given a finite (or countably infinite) alphabet of function, predicate, constant 
symbols, and a countable supply of variables, it is quite clear that the set T of all propositions 
built up from these symbols and variables can be enumerated systematically. We can define 
the function, Prov, with input domain T and output domain {0, 1}, so that, for every 
proposition P G T , 



Pro v(P) = | J 



if P is provable (classically) 
if P is not provable (classically). 



Mathematically, for every proposition, P 6 F, either P is provable or it is not, so this 
function makes sense. However, by Church’s Theorem (see Section 1.7), we know that there 
is no computer program that will terminate for all input propositions and give an answer in a 
finite number of steps! So, although the function Prov makes sense as an abstract function, 
it is not computable. Is this a paradox? No, if we are careful when defining a function 
not to incorporate in the definition any notion of computability and instead to take a more 
abstract and, in some some sense naive view of a function as some kind of input /output 
process given by pairs (input value, output value) (without worrying about the way the 
output is “computed” from the input). A rigorous way to proceed is to use the notion of 
ordered pair and of graph of a function. Before we do so, let us point out some facts about 
functions that were revealed by our examples: 



1. In order to define a function, in addition to defining its input /output behavior, it is 
also important to specify what is its input domain and its output domain. 

2. Some functions may not be defined for all of their input values; a function can be a 
partial function. 

3. The input /output behavior of a function can be defined by a set of ordered pairs. As 
we will see next, this is the graph of the function. 



We are now going to formalize the notion of function (possibly partial) using the concept 
of ordered pair. 
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2.2 Ordered Pairs, Cartesian Products, Relations, 
Functions, Partial Functions 

Given two sets, A and B, one of the basic constructions of set theory is the formation of an 
ordered pair, ( a,b ), where a G A and b G B. Sometimes, we also write (a, b) for an ordered 
pair. The main property of ordered pairs is that if (a\,bi) and ( a 2 ,b 2 ) are ordered pairs, 
where a\, a 2 G A and b\,b 2 G B, then 

(ai,bi) = (a 2 ,b 2 ) iff Qi = a 2 and b± = b- 2 . 

Observe that this property implies that, 

(a,b) ± (b,a), 

unless a = b. Thus, the ordered pair, (a, b), is not a notational variant for the set {a, &}; 
implicit to the notion of ordered pair is the fact that there is an order (even though we 
have not yet defined this notion yet!) among the elements of the pair. Indeed, in ( a,b ), the 
element a comes first and b comes second. Accordingly, given an ordered pair, p = ( a,b ), 
we will denote a by pr\{p) and b by pr 2 (p) ( first an second projection or first arid second 
coordinate) . 

Remark: Readers who like set theory will be happy to hear that an ordered pair, (a, b), can 
be defined as the set {{a}, {a, 6}}. This definition is due to Kuratowski, 1921. An earlier 
(more complicated) definition given by N. Wiener in 1914 is {{{a},0}, {{&}}}. 

Now, from set theory, it can be shown that given two sets, A and B , the set of all ordered 
pairs (a, b), with a G A and b G B, is a set denoted Ax B and called the Cartesian product 
of A and B (in that order). By convention, we agree that 0xR = Ax0 = 0. To simplify 
the terminology, we often say pair for ordered pair, with the understanding that pairs are 
always ordered (otherwise, we should say set). 

Of course, given three sets, A, B , C, we can form (A x B) x C and we call its elements 
(ordered) triples (or triplets). To simplify the notation, we write (a, b, c) instead of ((a, b), c). 
More generally, given n sets A\, . . . ,A n (n > 2), we define the set of n-tuples, 

A\ x A 2 x • • • x A n , as (• • • ((Ai x A 2 ) x A3) x • • • ) x A n . An element of A± x A 2 x • • • x A n 
is denoted by (ai, . . . , a n ) (an n-tuple). We agree that when n = 1, we just have Ai and a 
1-tuplc is just an element of A^ 

We now have all we need to define relations. 

Definition 2.2.1 Given two sets, A and B, a (binary) relation, R, between A and B is any 
subset R C A x B of ordered pairs from Ax B. When (a, b) G R, we also write aRb and we 
say that a and b are related by R. The set 

dom(R) — {a G A | 3b G B, (a, b) G R} 
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is called the domain of R and the set 

range(R) = {6 G B \ 3 a G A, (a, b) G R} 

is called the range of R. Note that dom(R) C A and range(R) C B. When A = B, we often 
say that R is a (binary) relation over A. 

Among all relations between A and B , we mention three relations that play a special 
role: 

1 . R — 0 , the empty relation. Note that dom( 0 ) = range(fh) = 0 . This is not a very 
exciting relation! 

2 . When A = B, we have the identity relation , 

id ,4 = {(a, a) | a G A}. 

The identity relation relates every element to itself, and that’s it! Note that 
dom (idyi) = range (kU) = A. 

3 . The relation Ax B itself. This relation relates every element of A to every element of 
B. Note that dom(A x B) = A and range(A x B) = B. 

Relations can be represented graphically by pictures often called graphs. (Beware, the 
term “graph” is very much overloaded. Later on, we will define what a graph is.) We depict 
the elements of both sets A and B as points (perhaps with different colors) and we indicate 
that a G A and b G B are related (i.e., (a, b) G R) by drawing an oriented edge (an arrow) 
starting from a (its source) and ending in b (its target), hi ere is an example: 



a 5 ° 




Figure 2 . 1 : A binary relation, R 



In Figure 2 . 1 , A = {01,02,03,04,0.5} and B = {61, 62, ^3, ^4}- Observe that 05 is not 
related to any element of B , 63 is not related to any element of A and that some elements 
of A, namely, 04,03,04, are related some several elements of B. 
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Now, given a relation, R C A x B, some element a G A may be related to several distinct 
elements b G B. If so, R does not correspond to our notion of a function, because we want 
our functions to be single- valued. So, we impose a natural condition on relations to get 
relations that correspond to functions. 

Definition 2.2.2 We say that a relation, R, between two sets A and B is functional if for 
every a G A, there is at most one b G B so that (a, b) G R. Equivalently, R is functional if 
for all a G B and all bi, b 2 € B, if (a, bi) G R and (a, b 2 ) G R, then b\ = b 2 . 

The picture in Figure 2.2 shows an example of a functional relation. 

«5 o 




Figure 2.2: A functional relation G 

Using Definition 2.2.2, we can give a rigorous definition of a function (partial or not). 

Definition 2.2.3 A partial function, f, is a triple, / = ( A,G,B ), where A is a set called 
the input domain of f , B is a set called the output domain of f (sometimes codomain of f) 
and G C A x B is a functional relation called the graph of f; we let graph(f) = G. We write 
/: A — > B to indicate that A is the input domain of / and that B is the codomain of / 
and we let dom(f) = dom(G ) and range(f) = range(G). For every a G dom(f), the unique 
element, b G B, so that (a, b) G graph(f) is denoted by /(a) (so, b = /(a)). Often, we say 
that b = f(a) is the image of a by f. The range of / is also called the image of f and is 
denoted Im (/). If dom(f) = A, we say that / is a total function, for short, a function with 
domain A. 

Remarks: 

1. If / = ( A,G,B ) is a partial function and b = f(a ) for some a G dom(f), we say that 
/ maps a to b ; we may write / : a i— »• b. For any b G B, the set 



{a G A j /(a) = b} 




2.2. ORDERED PAIRS, CARTESIAN PRODUCTS, RELATIONS, ETC. 



67 



is denoted f~ 1 (b ) and called the inverse image or preimage ofb by f. (It is also called 
the fibre of f above b. We will explain this peculiar language later on.) Note that 
/ _ 1 ( 6 ) 7 ^ 0 iff b is in the image (range) of /. Often, a function, partial or not, is called 
a map. 

2. Note that Definition 2.2.3 allows A = 0. In this case, we must have G = 0 and, 
technically, (0, 0, B) is total function! It is the empty function from 0 to B. 

3. When a partial function is a total function, we don’t call it a “partial total function”, 
but simply a “function” . The usual pratice is that the term “function” refers to a 
total function. However, sometimes, we say “total function” to stress that a function 
is indeed defined on all of its input domain. 

4. Note that if a partial function / = (A,G,B) is not a total function, then dom(f) A 
and for all a G A — dom(f), there is no b e B so that (a, b) G graph(f). This 
corresponds to the intuitive fact that / does not produce any output for any value not 
in its domain of definition. We can imagine that / “blows up” for this input (as in the 
situation where the denominator of a fraction is 0 ) or that the program computing / 
loops indefinitely for that input. 

5. If / = (A, G, B) is a total function and A 0, then B 0. 

6 . For any set, A, the identity relation, icU, is actually a function id^: A — > A. 

7. Given any two sets, A and B, the rules (a, b) i— » a = pri((a,b)) and (a, b) > b = 

pr 2 ((a,b)) make pr i and pr 2 into functions pr\ : A x B — > A and pr 2 : A x B B 

called the first and second projections . 

f 

8. A function, /: A — > B, is sometimes denoted A — > B. Some authors use a different 
kind of arrow to indicate that / is partial, for example, a dotted or dashed arrow. We 
will not go that far! 

9. The set of all functions, f: A — ► B, is denoted by B A . If A and B are finite, A has m 
elements and B has n elements, it is easy to prove that B A has n m elements. 

The reader might wonder why, in the definition of a (total) function, /: A — > B, we do 
not require B = Im/, since we require that dom (/) = A. 

The reason has to do with experience and convenience. It turns out that in most cases, 
we know what the domain of a function is, but it may be very hard to determine exactly 
what its image is. Thus, it is more convenient to be flexible about the codomain. As long 
as we know that / maps into B, we are satisfied. 

For example, consider functions, /: M — > M 2 , from the real line into the plane. The 
image of such a function is a curve in the plane M 2 . Actually, to really get “decent” curves 
we need to impose some reasonable conditions on /, for example, to be differentiable. Even 
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Figure 2.3: Lemniscate of Bernoulli 



continuity may yield very strange curves (see Section 2.10). But even for a very well behaved 
function, /, it may be very hard to figure out what the image of / is. Consider the function, 
t i-> (x(t),y(y)), given by 



x{t) 

y(t) 



t(i + t 2 ) 
i + t 4 

i + t 4 



The curve which is the image of this function, shown in Figure 2.3, is called the “lemnis- 
cate of Bernoulli” . 

Observe that this curve has a self-intersection at the origin, which is not so obvious at 
first glance. 



2.3 Induction Principle on N 

Now that we have the notion of function, we can restate the induction principle (Version 
2) stated at the send of Section 1.8 to make it more flexible. We define a property of the 
natural numbers as any function, P : N — » {true, false}. The idea is that P(n) holds iff 
P{n ) = true, else P{n) = false. Then, we have the following principle: 

Principle of Induction for N (Version 3). 

Let P be any property of the natural numbers. In order to prove that P(n) holds for all 
n G N, it is enough to prove that 

(1) P(0) holds and 

(2) For every n e N, the implication P{n ) P(n + 1) holds. 




